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(54) 

(57) The present invention provides DNA molecules 
that constitute fragments of the genome of a plant, and 
polypeptides encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3' termination sequence, and are also useful in 
controlling the behavior of a gene in the chromosome, 



in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or identification of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

OArabidopsis DNA is used in the present experi- 
ment, but the procedure is a general one. 
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FIELD OF THE INVENTION 

and proteins. 

DESCRIPTION OF THE RELATED ART 

, SUMMARY OF THE INVENTION ^ ^ sequences of 

[00 0 3] Thepresentinventic.cornprisespo,^ 
gScDNAencompassing comply 

with other functions and/or intergemc regions, here naner c - Arabidopsls thalmna, and other 

5 ments (SDFs), trom different plant species, part ,cula ^ an d polypeptides or proteins derived theref romjn 



[0006] 

steps: 
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SSSmcial chromosomes or other types of ^^^^^n.^ is, preferably, functionally integrated 
ToNA -grated^ a coding region from an SOP m,ght b e 

o P %STo y a promote; that is functional in a plant ^ Qr ^ , te , an , p ^ 

raOlO] The present invention also res>des in host ceHs .ncludmg fa ^ ffl g 

Z harbor constructs such as described -"T^^^ of the constructs, by regulation of expres- 
expression of specific genes in plants by 

sion of one or more endogenous without limitation (1 ) inserting into . host 

BRIEF DESCRIPTION OF THE TABLES 

10 „. lon 9 e 5 t cDNA obtamsd, ^^^^ZJL d « BEF Tables. 

I. cDN A Sequence 

3 0 A. 5' UTR 

B. Coding Sequence 

C. 3' UTR 

II. Genomic Sequence 



A. Exons 

B. Introns 

C. Promoters 

III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 

V. Polypeptide Sequences 

A. Signal Peptide 

B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotide Sequences 
■ gPNA SEQUENCE 



[00131 ThBBEFTab , a^TK^sss- 



A. 5' UTR 
[0014] The 



location of the 5" UTR can be determined 



lined by comparing the most 5' MLS sequence with the corresponding 
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Bsssssssssss ■sssssssssr 
'^ss^ ^ 

C - 3 ' UTR most 3' MLS sequence with the corresponding 

o 

__ — - , m-i^T i -IS- ' 

j-^_UT R I Exgn J." ^ J 

I t intron I 

Promoter I In "° n Stop Codon 

Translationaj. 
Start Site 

is shown below: 
gi No. 47000: 

40 A F vnM SEQUENCES jc 

susssss =s=s=s s="~ 

<s I INITIAL EXON 

m To «em« <— » * «» "* eX<>n ' * rma,t " me 

(1) polypeptide sequence section; 
50 (2) cDNA polynucleotide section: and 
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was located, will represent the end of the initial exon. In some cases, the initial exon will end with a stop codon, when 
the initial exon is the only exon. 

[0022] In the case when sequences representing the MLS are in the positive strand of the corresponding genomic 
sequence the last base will be a larger number than the first base. When the sequences representing the MLS are in 
s the negative strand of the corresponding genomic sequence, then the last base will be a smaller number than the first 
base. 

ii INTERNAL EXONS 

10 [00231 Except for the regions that comprise the 5' and 3" UTRs, initial exon, and terminal exon, the remaining genomic 
regions that match the MLS sequence are the internal exons. Specifically, the bases defining the boundaries of the 
remaining regions also define the intron/exon junctions of the internal exons. 

ill. TERMINAL EXON 

15 

[0024] As with the initial exon, the location of the terminal exon is determined with information from the 

(1) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 
20 (3) the genomic sequence section 

of the REF Tables. The polypeptide section will indicate where the stop codon is located in the MLS sequence. The 
MLS sequence can be matched to the corresponding genomic sequence. Based on the match between MLS and 
corresponding genomic sequences, the location of the stop codon can be determined in one of the regions of the 
25 genomic sequence The location of this stop codon is the end of the terminal exon. Generally, the first base of the exon 
of the corresponding genomic region that matches the cDNA sequence, in which the stop codon was located, will 
represent the beginning of the terminal exon. In some cases, the translations start site will represent the start of the 
terminal exon, which will be the only exon. 

[0025] In the case when the MLS sequences are in the positive strand of the corresponding genomic sequence, the 
so last base will be a larger number than the first base. When the MLS sequences are in the negative strand of the 
corresponding genomic sequence, then the last base will be a smaller number than the first base. 

B INTRON SEQUENCES 

35 [0026] In addition, the introns corresponding to the MLS are defined by identifying the genomic sequence located 
between the regions where the genomic sequence comprises exons. Thus, introns are defined as starting one base 
downstream of a genomic region comprising an exon, and end one base upstream from a genomic region comprising 
an exon. 

40 C. PROMOTER SEQUENCES 

[0027] As indicated below, promoter sequences corresponding to the MLS are defined as sequences upstream of 
the first exon; more usually, as sequences upstream of the first of multiple transcription start sites; even more usually 
as sequences about 2,000 nucleotides upstream of the first of multiple transcription start sites. 

45 

III. LINK of cDNA SEQUENCES to CLONE IDs 

[0028] As noted above, the REF tables identify the cDNA clone(s) that relate to each MLS. The MLS sequence can 
be longer than the sequences included in the cDNA clones. In such a case, the REF table indicates the region of the 
so MLS that is included in the clone. If either the 5' or 3' termini of the cDNA clone sequence is the same as the MLS 
sequence, no mention will be made. 

IV. Multiple Transcription Start Sites 

55 [0029] Initiation of transcription can occur at a number of sites of the gene. The REF tables indicate the possible 
multiple transcription sites for each gene. In the R*EF tables, the location of the transcription start sites can be either 
a positive or negative number. The positions indicated by positive numbers refer to the transcription start sites as 
located in the MLS sequence. The negative numbers indicate the transcription start site within the genomic sequence 
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Ed «» *• corresponding genomic seqo.no.. In » P ce ,„ ,„„ „„ lcoMe seque „c. ,nd,ea..d 

subsection. 

- SooT^^^ 

one polypeptide sequence. 

20 

30 Iv^ere present) a name for the polypept.de doma.n. 
VI Related 0 ^""°"**'°* Sequences 



Abbreviation 



MaxLe 
relto 



Pub gDNA 



Description 



Maximum Length Sequence^ 



Rel ated to 
Clone ID numbers 



Public Genomic DNA 



gi No. 



Gen. seq. in cDNA 



ji number 



Genomic Sequence 
listed on a separate line. 



in cDNA (Each region for a single gene prediction is 



ln the case^Tr^^^ - ^ — » * * 
are separated by a blank line) 



single pn 
cDNA sequence 



EP 1 033 405 A2 

(continued) 



Abbrev iation 

Pat. Appln. SEQ ID NO 



PolyP SEQ 



-Ceres SEQ ID NO: 1673877 



- SEQ # w. TSS 



- Clone ID #: # ->* 



Description 

Patent Application SEQ ID NO: 



Ceres SEQ ID NO - 

which are listed 



. Pat. Appln. SEQ ID NO: 



Polypeptide Sequence 



- Ceres SEQ ID NO 

- Loc. SEQ ID NO. O nt. 

(C) Pred. PP Nom. & Annot_ 



Paten t Application SEQ ID NO: 
Ceres SEQ ID NO: 



Ceres SEQ lu nv. - 

number 



(Title) 

• Loc. SEQ ID NO #:#-># aa- 



(Dp) Rel.AASEQ 



Name of Domain —- ■ — — ' 

a mino acid residues. . 

Related Amino Acid Sequences . 



- Ali gn. NO 

- gi No 



- % Idnt. 



- Align. Len. 



- Loc. SEQ ID NO:#->#aa 



Alignment number 



Gi number 
Description 



Percent identity 



Alignment Length 



niiyiimoi.i. — ■■a-- 

Io ^ 0 7-^ res,due_ 



DgMH pn DESCRIPTIONOE SB INVENTION 



IA Probes, Primers and Substrates; 
IB. Methods of Detection and Isolation; 

B.1. Hybridization; 

B.2. Methods ol Mapping; 

B 3. Southern Blotting; 

B 4 Isolating cDNA from Related Organisms 

It IsoiS and/or Identifying Orthoiogous Genes 

IC. Methods of Inhibiting Gene Expression 

C.1. Antisense 

C.2. Ribozyme Constructs; 

C.3. Chimeraplasts; 

C 4 Co-Suppression; 

C 5 Transcriptional Silencing 

C.6. Other Methods to Inhibit Gene Express.on 

ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use, 
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IF. UTRs and/or Intron Sequences and Their Use; and 
IG Coding Sequences and Their Use. 

HA. Native Polypeptides and Proteins 

A.1 Antibodies 

A. 2 In Vitro Applications 

IIB. Polypeptide Variants, Fragments and Fusions 

B. 1 Variants 
B.2 Fragments 
B.3 Fusions 

[0o40] The invention „so incudes <„,) —* - 8U "' " 

MIA. Suppression 

A.1 Antisense 
A.2 Ribozymes 

ASSESS-*.-.*-" - 

Brpresstoi'of'Get^ containing DomUiant-Negatlve Mutations 
IHB. Enhanced Expression 

o B. 1 Insertion of an Exogenous Gene 

B.2 Promoter Modulation 

10041, Theinventionfurthercon^^ 

is |VA. Coding Sequences 

IVB. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
40 V Transformation Techniques 

Definitions 

« LsoLlocusin.h,o,ganisrn.A»e^^^ 

g=c=^^ 

so silent allele can give rise to a product. invention, alternatively spliced messag- 

arrss JTKrs?- — • - - — • — -* °* -* 

inttcns and/or intrco-exon junctions „, OTlrucls heroin at least 

10 O4S, Chime* The »™ £j ^ lota- and tha coding sequence and/or other regulator, 
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niant npnp<; such as the maize ubiquitin-1 promoter, known to those of skill. 

mSr Coo* inately Expressed: The term coordinately expressed," as used in the current ,nvent,on, refers to 
S2 thatTre expressed'Lt the same or a simHar time and/or stage and/or under the same or s,m,lar env,ronmental 

^Toomain: Domains are fingerprints or signatures that can be used to characterize protein ^ ^ 
ZkL omteins Such finqerprints or signatures can comprise conserved (1) primary sequence, (2) secondary struc- 
ture a ^ Genera,| y- each domain has been associated with e T a y 

protefns or mo rf TyS these families and/or motifs have been correlated with specific in-vitrc > and/or ,n-wvo ac- 
SSe s A doma ^ can bt any length, including the entirety of the sequence of a protein. Detailed descriptions of the 
doma ns ^ated families and motifs, and correlated activities of the po^peptides of the instant .nven ion < are de- 
served be^ow Usually, the polypeptides with designated domain(s) can exhibit at leas, one activity that ,s exhorted by 

otkte , polypeptide or protein sequence which is a natural part of a cell or organisms regenerated from said cell. 
S SI Exogenous," as referred to within, is any polynucleotide, polypeptide or pro^m sequence, 

Salomon f^^^^j^T^.^ (1996 ), ishida et al„ Nature Biotechnology 14:745 (1996), May 

SinESuiiSd the like. Such a plant containing the exogenous nucle,c acd ,s referred to here as a T, , tor 
Tp^Z^c plant and T, for the first generation. The term exogenous" as used herein ,s also intended to 

SSI JonSlS^ a particular spacing between particular components such as a promoter and a coding region 

^J^^S^ with a single hereditary unit with a genetic function (see SCHEMATIC Irenes 
can include no cooing sequences that modulate the genetic function that include, but are not limited to those , that 
soecift polyienyS on transcriptional regulation, DNA conformation, chromatin conformation, extent and position of 
basemeSona^ 

; S^S^Sid by introns" (non-coding sequences), encode proteins. A gene's genetic func^on may equ re 
X RNA expression or pro'tein production, or may only require binding of proteins and/or nucleic aads wrthout asso- 
dated exore^on In certain cases, genes adjacent to one another may share sequence in such a way that one gene 
SoveZTe othlr. A gene can betound within the genome of an organism, artificial chromosome, p.asm,d, vector, 

Suous to eacn other in nature. For example, a promoter from corn is considered heterologous to an Arab.dops.s 
c^ 

s to a seauence encoding the com receptor for the growth factor. Regulatory element sequences, such as UTRs or 3 
Ind termination 'sequences that do not originate in nature from the same gene as the coding sequence originates from 

corn gene operatively linked in a novel manner are heterologous. qflnuBnce 
Itosl Homologous gene In the current invention, homologous gene" refers to a gene that shares sequence 
Sity Jth the gene of interest. This similarity may be in only a fragment of the sequence and often 

domain such as, examples including without limitation a DNA binding demon a domain with tyrosine kmase 
>5 activity or the like The functional activities of homologous genes are not necessarily the same. 

S2' indue ble Promoter An inducible promoter" in the context of the current invention refers to a promoter 
Sis reguTafed uncertain conditions, sucE as light, chemical concentration, protein 

an organism, cell, or organelle, etc. A typical example of an inducible promoter, which can be utilized with the polynu 
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sria^S^ssas 

location. Mutants ol the current .nvention mayo fprs t0 a second gene that encodes a 

gene is transcribed. jnven , on ortho logous gene "jwJV",, ge 9 ne may aiso have a 

[0059] Orthologous Gene Jnt ^ ^ ^ Qf g ,,rst gene. T^e orthdo 9^ ^ & ^ 

g ene prod uc fifSt ^ ^ "TS ca " be ^ ^ 3 

tunctionai domain or along the ent.re leng _ ^ fay 

compared to the "-W*"^^^ determining the number ol P^^jJ^. dwi ding the number 

the local homology algorithm ol Smith at» t, ^ search , or similarity mem (GAR 

2S go,*m o. Neadlamah ^ W "?^,aT 2 « -ttsTb comoutarteeo W*r"^^SSZ » 
9 U pman Proo. »' ^'^ s f A „ « Wisoon* Gar,*, ^^^Xdlorcompanaon, 

preferably, at least 96% 97 K 98% o 9^ q ^ & f ^ of « o1 the SDF of the 

[0 061] Plant Promoter AP'amP ^ ^ jnstant , nventlon 0 r a «*"B«J plant viruses, such 

35 driveorfacilitatetranscr^t.onofaf^ 
instant invention. Such promoters need 

45 usually located between 40 and 200 ap pBcaBon. refers to 

transcription. b)jc seque nce ," as used in the °° nte * ° bQtn amino ac id 

[0 063] Public sequence. Th tern P accessjble database . T h>s tern databases on tne 
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sjalpeptidesare indicated in ^ ~ 

[0069] Specific Promoter In the context of the current 'nveni™ v k d 
ble promoters that have a high preference for being .nduced n a spec JJJ more P preferably at , eas t 

d eve P ,o P ment of an organism. ^^^^^^^^s^sL^e 
,s 1 0-fold still more preferably at '"^^ and/or , isS ue specific promoters of plant origin that can 

transcription in any other ssue. ^^^"^ a pro P mo ter which is capable of driving gene 
be used with the polynucleotides of ^^^^f^T^ (Koltonow et al., Plant Ce// 2:1201 (1990); 
transcription specifically in tapetum and only during an,he ^ dev f '°P^ P £ , piantMoL Bioi 2 7:237 (1995); 

RCc2 and RCc3, promoters that direct root-spec ft, Examples o^ssuespecific 

p T er ^tr2^^ inc,ude those ,rom rr encodin9 stora9e 

P 0070] Stringency "Stringency" as used herein "'^iE^SSS^or wash conditions. Stringency 
25 andsattconcentration.organfcso^^ 

is typically compared by the parameter T m , ^?J*°^*™* Z T m High stringency conditions are those 
in the hybridization ^ 

30 o?h?b°dization conditions to T m (in »C) is expressed in the mathematical equation 



T =81.5 -16.6(log l0 [Na + ]) + 0.41 (%G+C) - (600/N) 



(D 



than 500 nucleotides, and for conditions that include an organic sorvent (formamide). 

T m = 81.5+16.6 log {[Na + ]/(1 + 0.7[Na + ])} + 0.41 (%G + C)-500/L 0.63(%formamide) (2) 
where Us the lengthoMhep™ 

Technjgues jrj Biochemistry and " 10 -15»C higher than calculated, for 

of equation (2) is affected by the nature io < °£^^] m £ , 0 r each 1% decrease in homology 

=^ 

detection of identical genes or related lamily members „ yllIidiza ,i ons according to the present Invention 
[007,] Equation (2, ^^^^T^^XrtL-n time to achieve equilibrium The 

She sa» and temperature conditions d the ™* f "X^ts^ecSl^e^ vvtthln the ranges stated 
^rsrilV-rseo. A composition containing A is substantially free ol B when at least 85% by weight 
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■ »tgt»w 

DETAILED DESCRIPTION OF THE INVENTION 

20 

" SZTT-ss sassrasssssa: ass 

, sszni — 

amplification product. 

I A Probes, Primers and Substrates 

- ,00*, ^--^--trr^rKss^^™^ 

so distant related sequences Sl ™7 m pr0D for subseqU ent cloning purposes. nuc , e otides praf- 
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I R Methods of Detection and Isolation 



r00861 The polynucleotides of the invention can be utilized in a number of methods known to those skilled in the art 
as probes and/or primers to isolate and detect polynucleotides, including, without limitation: Southerns, Northerns, 
Branched DNA hybridization assays, polymerase chain reaction, and microarray assays, and variations thereof. Spe- 
cific methods given by way of examples, and discussed below include: 

Hybridization 
Methods of Mapping 
Southern Blotting 

Isolating cDNA from Related Organisms 
Isolating and/or Identifying Orthologous Genes. 

Also the nucleic acid molecules of the invention can used in other methods, such as high density oligonucleotide 
hySctog assays, described, for example, in U.S. Pat. Nos. 6,004,753; 5,945,306; 5,945,287; 5,945 30 5,91 9,686, 
5 919 661 5 919 627 5,874,248; 5,871,973; 5,871,971; and 5,871,930; and PCT Pub. Nos. WO 9946380; WO 
9933981 ; WO 9933870; WO 9931252; WO 9915658; WO 9906572; WO 9858052; WO 9958672; and WO 9810858. 

B.1. Hybridization 

f0087] The isolated SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be used as probes and/ 
or primers for detection and/or isolation of related polynucleotide sequences through hybridizat.on. Hybridization of 
one nucleic acid to another constitutes a physical property that defines the subject SDF of the invention and the identified 
related sequences Also, such hybridization imposes structural limitations on the pair. A good general discussion of 
the factors for determining hybridization conditions is provided by Sambrook et al. ("Molecular Clonmg, a Laboratory 
Manual 2nd ed c 1989 by Cold Spring Harbor Laboratory Press Cold Spring Harbor, NY; see esp., chapters 11 and 
12) Additional considerations and details of the physical chemistry of hybridization are provided by G.H. Keller and 
M.M. Manak DNA Probes", 2«« Ed. pp. 1 -25, c. 1 993 by Stockton Press, New York, NY 

[00881 Depending on the stringency of the conditions under which these probes and/or primers are used, polynucle- 
otides exhibiting a wide range of similarity to those in REF and SEQ TABLES 1 AND 2 can be detected or isolated. 
When the practitioner wishes to examine the result of membrane hybridizations under a vanety of stringencies, an 
efficient way to do so is to perform the hybridization under a low stringency condition, then to wash the hybridization 
membrane under increasingly stringent conditions. 

[00891 When using SDFs to identify orthologous genes in other species, the practitioner will preferably adjust the 
amount of target DNA of each species so that, as nearly as is practical, the same number of genome equivalents are 
present for each species examined. This prevents faint signals from species having large genomes, and thus small 
numbers of genome equivalents per mass of DNA, from erroneously being interpreted as absence of the corresponding 
qene in the genome. , 4 . . 

[0090] The probes and/or primers of the instant invention can also be used to detect or isolate nucleotides that are 
identical- to the probes or primers. Two nucleic acid sequences or polypeptides are said to be "identical" if the sequence 
of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum 
correspondence as described below. 

[00911 Isolated polynucleotides within the scope of the invention also include allelic variants of the specrfic sequences 
presented in REF and SEQ TABLES 1 AND 2. The probes and/or primers of the invention can also be used to detect 
and/or isolate polynucleotides exhibiting at least 80% sequence identity with the sequences of REF and SEQ TABLES 
1 AND 2 or fragments thereof. , 
[00921 With respect to nucleotide sequences, degeneracy of the genetic code provides the possibility to substitute 
at least one base of the base sequence of a gene with a different base without causing the amino acid sequence of 
the pofypeptide produced from the gene to be changed. Hence, the DNA of the present invention may also have any 
base sequence that has been changed from a sequence in REF and SEQ TABLES 1 AND 2 by substitution in accord- 
ance with degeneracy of genetic code. References describing codon usage include: Carels era/., J. Mol. Evol. 46: 45 
(1998) and Fennoy et al., Nucl. Acids Res. 21(23) : 5294 (1993). 

B.2. Mapping 

[0093] The isolated SDF DNA of the invention can be used to create various types of genetic and physical maps of 
the genome of corn, Arabidopsis, soybean, rice, wheat, or other plants. Some SDFs may be absolutely associated 
with particular phenotypic traits, allowing construction of gross genetic maps. While not all SDFs will immediately be 
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This procedure, however, ,s not tented pia ^ gsR 

Genera / Genata (1993) ^ ^^^c probes contained within an ^ Jtenk^ « ^ Her6i achange 

o< close* re»d = a ^Sl^^-^S.-^^^"^'^ 

different chromosomes, and QJ^JJ^^i by de Vicente and Tanksley (Genet f rom the 
identifyQTLsandisolatespecrfjcale^ 

45 Us/sand related ^^^^ , ib raries d plant DNA fragments in YJDj. ^ 

as probes to the assignment of the large studies o, their sequence 

SDF or similar sequences tner ay ^ unambigu0 usly by -more a seqU ences to 

Subsequently ^^^^£^c Research 7,072-1084) ar , by in this way allows 

composition (e.g. Marra et al. (199/ ^ The overlapping of DNA seq q{ a 



# 
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multiple phenotypic traits. 

B 3 Southern Bio * Hybridization 

same or different species and/or orthologousge iate genomic DNAorcDNA. Given the resulting 
^^^^ B ^^S^SS!^^ isolate the correct DNA fragments by size, 
hybridization data, one of ordinary skill in the art coul . 0 s «ng 

restriction sites, sequence and stated hybr,d.zat,on condit ^from a 96 ^ wjthin g spec|es „ 

0 m , dentifica'on and isolation of Many important crop traits, such as the solid 

partic larly desirable because of their ^^S^^Lb of several genes residing at different loc. in 
content of tomatoes, resurt from the combined drtferences t0 the ^ BV ^"Hions 
the genome. Generally, alleles at each °' these toe. can ira ke q wjth vafjous cornbinatl0 n S 

olating numerous alleles for each locus rom ^^^rt Qnce a 9 more favorable allele combinat,on has 
ofalletoscanbecreated^^^^^^ 

S'b^ 

[0104] The results from hybridizations of the S^^^er* maps for the corresponding genomic reg.ons. 

0 06 Probes for Southern blotting to distinguish «**-»™^ 1Q ° t0 , 000 nucleotides long for identifying 

members of a gene family when it » found ^£P* rt ' v "£ more preferab |y the length of the gene, typically 2,000 
an entire corresponding gene in another species, the probe is more p y mjght fequ|re 

to 10000 nucleotides, but probes 50-1,000 nude^des ^^^^1^^^^ 

quences that define the gene family. t preterab | e probe is a cDNA spanning 

P108J For identifying corresponding genes '"^^^g fragment of the gene to be identified. Probes for 
he entire coding sequence, which allows ^ ^ "J^JJJL ha ving the sequence at the ends of the SDF 
Southern blotting can easily be generated If rom where the SDF includes sequence conserved 

4 o andusingcomorA^ops/sgenomicDNAasaempteta 

among species, primers including to conserved ^^^^^^^f^gmA^S^ca^^ 

species, can be examined. 

n ) II llhti nr|^^— Organisms 

50 [om TheSDFsoftheinvention — 

or genomic DNA can be isolated. For Kttng I " ^ 

genomic libraryfrom the plant of interes can be ^7^X Manual, 2»* ed. Cold Spring Harbor Laboratory 
n detail by Sambrook et a.. 1989 (Mo ecu ar Co ng. A La* - < ^ ^ pub|jshing Ne w Yor£ 

Press, New York) and by Ausubel et al. 1992 (Cu <**™™° c|ones are plated out 0 n appropriate bacterial 

" 
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oSeiaMv adjusted Id a less slringent cdndwon (e.g., Tm 30 C) tow ^ m lhen suBsequen ,| y 

bacterial clones, the filters are processed through the denaiu ™ bda |jbrary screening are followed. 

Z-y can b. consoled In a lambda .ectoi ^ ropnato £ an> rt w m ., h ods known in the ail 



clones. 

p 5 ,c.. ati n n and/or innntrty inqOrfhologous Genes 



B 5 isolating ang/oi m """""" 1 ° 

P^andp-I.els^.lnven,^^ 

aene Orthologous genes have the same functional activity _as . and jn closely re | a ted species, 

can be less than 75% identical, but tends to be at lea*75% or at ^ ^ orthologous genes, 

nrlrablv at least 95% identical to the ammo acid sequence of the re £ conditions, preferably one 
rS^tSwdl** to nucleic acids from a species of J^^J^SL This condrtion is established 
c ° ntainin ° aS mUCh 3S "t^^^^lncraa^^Qen^ It is preferable 

Li ^toT m -48«C(s 0 abe.ow).B.otsarethen^ 

hatthewashstrinW^ 

go ,o 100% identical will hybridize and most Pf^^"*"™^ genetic code, amino acid sequences tha a e 
J ortnary skill in the art will recognize that, ^J^JJJ^i less. Thus, I is preferable, for example^ o 
identical can be encoded by DNA sequence as Irttle a 67% ^en ^ hybndize them to 

Ike an overlapping series of shorter probes, ^^^^ large numbers of mismatches, 
me same arrayed library to avoid the problem of degeneracy J » Jhus Qne of sk(|| Wll , recog - 

[0114] As evolutionary divergence increases, ^^^^ will require the use of lower stringency 
5 that searches for orthologous genes *K deg eneracy of the genetic code is more of a 

-sr. rr;^ 

^"o^heinventi.^ 

L cies. Such related genes ^f*^ The fragments of similar sequence 

similarity wi.l often be concentrated in ^™^ 

that define the gene family typically * nc * e *^™™l ° " P e domajn that defines the gene family is preferably ^at least 
o The percentage of identity in the ammo ac d ^""^J 9 * searcn (or membe rs of a gene family within a species, 
70% more preferably 80 to 95%, mos prefera J 85 to 99 -To* on ^ ^ djstrjbution and degree of se- 

a low stringency hibridization is " s " 3 '^ regulatory regions can be used to 

quence divergence of doma.ns that define the req ion sequence of the SDF as a probe, 

identify coordinate, expressed 

15 K,t^™ 
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I.C. Methods to Inhibit Gene Expression 
[0 117] Thenucleicacidmoleculesolthepresentinventioncan 
Example of such methods include, without hm.tat.on. 

Antisense Constructs; 
Ribozyme Constructs; 
Chimeraplast Constructs; 
Co-Suppression; 
Transcriptional Silencing; and 
Other Methods of Gene Expression. 



beusedtoinhibitgenetranscriptionand/ortranslation. 



m .nsomeinstances^sdes^ 

See is the FLAVOR-SAVOR" tomato, .n ^£^££^1^. U.S. Patent No. 5,859,330; U.S. Patent 
ap^oach, thus delaying softening of ^^^S^^ Nature, 346:284-287 (1990). Also, t.ming 
No 5,723,766; Oeller, etal, Science, 254:437 " 4 ^ L0 CUS C(FLQ; high levels of this transcnp are 

of flowering can be controlled by suppress.on^ of ^ early flowering (S.D. Michaels et al Plant Ce 
associated with late flowering, while a bsen eo ^^g^ of Jves with associated shoots to flowery is 
11-949 (1999). Also, the transit.on of apical mer stem rom i proou i ndu ce a transition from shoot 

Regulated by TERMINAL FLOWER1, APETALA1 an ^Jggg'jjp fs J Liljegren, Plant Cell 11:1007 (1999)). As 
oroduction to flowering, it is desirable to suppress > TFU °W*™°n d tne ethylene forming 
^stance, arrested ^ d~ 
; enzyme but can be reversed by application of einyie . v creati hybnds 

erarte antisense constructs to inhibit ^nslattonanoVo degradation ™ 
Pir;accom P lishthis.apoVnucleotide^ 

rom the desired gene (the antisense segmen Oj J A regulated promoter can be used in the construct to 

35 will be transcribed when the construct .s presert ^ ^ under desired circumstances. 

J 23? For antisense suppression, the introduc d an sen -^^enera.ly, a higher percentage of sequence 
o either the primary transcription Product or the f U ,,y Furth ermore, the introduced sequence need 

identrty can be used to compensate for the u segments may be equally effective. Normally 
45 not have the same intron or exon pattern, a nd homo'ogy o noncoa g » can be used] thougn a sequence 

SSSSSs* 

50 C.2. Ribozvmes ^ ^ ^ ^ 

[0 1 24] It is also contemplated that gene constructs W^^Z used To SSS expressed genes by 
SEQ TABLES 1 AND 2 are an object of the invention. tQ desj ribozymes that specifically pa.r with 

LppreLgthetrans^ 

55 vitally ^^^^^^STS^^Si^ altered, and is thus capable of recycling .and 
the target RNA. In ^^^^.^ZL^I ribozyme sequences within antisense RNAs confers 



EP 1 033 405 A2 

W099/58723 and WO99/07865). 
jression 



C.4. Sen^eSuEEiession modu|ate gene 

C.5. Transcriptional Silencing ^ i ^ g apd 
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promoter. In such a case, the oligonucleotide can be comp.ementary to the sequences o, the promoter that interact 
with transcription binding factors. 

6 othor Methods to Inhibit ftene Expression 
[0132] Yet another means of suppressing gene expression is to insert a poiynucleotide into the gene of interest to 

performed by selecting clones or R, plants having a desired phenotype. 
i n Methods of Functional Analysis 

,0,35) The OT » described in ,be methods under 1.0. above ean be used to determine ,he 1unc.ien o, ,he 

Ltrrm^r^^ 

are being modulated by the down-regulation of the ^^f^ to a des ired polygenic trait, is sometimes 

(1999)) - <u- ♦• «,« =,icr, hP used in the two-hybrid genetic systems to identify networks of protein- 

ISin^S 

' -ei^io : r^ 

(e.g. B. Luo et a/., J. Mot. Biol. 266:479 (1997)). 



I.E. Promoters 



tnm TheSDPscabeinven^areaisauaeK—^ 

th e expreesion o. the ™™ o, me present invention can be usetul h 

r;n^r^r;::u":eie":,rcS 

promoters are more likely to be found in the first !0uu nucww™ ' J particular, the promoter is usually 
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.• „ moHulatina the level of transcription with 

: 

" sr»sKS=asssssssasss sss= 

53SSSS5SS-— ~ 

" '.Tt^ — . •srrxpr. 

40 [0147] These fragments of SDF^ecaiy ^ jso|ated for use as elemenls o1 gene 

activity. 

presented in REF AND SEQ TABLES 1 AND 2. 
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[0151] Anucleotideseo,^^ 

sequence produces a polypeptide having the rec^ 

and the primary transcript is subsequently ^^^^^X^^^'^^^I'T^ 

»5S5SSS= — 

[0,631 in addition to coding "^£%S1!5£- P» the *°» d Pd*™**"" ^ 

U. more prcmbly less «n , 5% e».n morc p e «ab V « » «»^ n00<ie9e „ e , a ,. nucleotide sconce 
comprising a particularly exemplified sequence^ I « g«*»a»/ "drably 1 lo 3 amino acid insertions, deletions or 
clges L res.., in . to .0, more pretext, V^f^LSdX mos, preferred embodiments are those 
sebsLi^-rtotgre*^ 

as a hybridization probe. 
5 II. Polypeptides and Proteins 

HA. Native polypeptides and proteins 

[0 ,se, Po^tide. «* ,0. scop. * « ~ 

?£. nereides include those -«^^ 8nce id e„,« y ,o rnoee n*e polypeptides o, 
35 [01581 Polypeptide and protein variants trt «M< J**~^^rto «il exhibit 311088186% sequenoerdenlity, 
F€FANDSEQTABLES,AND2.Moreprererab^^ 

elenmo^pre.e^^^^^ 

tfenuty Fragments o<Po|yPPP«^ 

tivity, receptor binding, signal transduction, £™J ^ 0 activities , the varia nts preferably exhibit at least 

men'sionalstructure, etc. As to least 70% even more preferably at least 80%, 85%, 90% 

45 60% of the activity of the native prote.n; more preferably 

or 95% of at least one activity of the native prolan. subsWu tions, deletions and/or insertions. 

[0160] One type of variant of na we P°W «£ '° m P« Qf tne polyp epf,de. 

Conservative substitutions are preferred ^'"^J^^X Ascribed above, a polypeptide of the invention may 
[0161] Within the scope of percentege Verted into the porypeptide in the middle thereo 

may be deleted from the polypeptide. 
A 1 Antibodies 
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The antibodies are also useful for examining the production level of proteins in various tissues, for example ,n a wild- 
tvoe plant or following genetic manipulation of a plant, by methods such as Western blotting. 
0 63 In bodies of L present invention, both ooryclonal and monoclonal, may be prepared by conventiona meth- 
ods In general the polypeptides of the invention are first used to immunize a suitable animal, such as a mouse, rat, 
Sbit or ooLt Rabbits anS goats are preferred for the preparation of polyclonal sera due to the volume of serum 
obtainable and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
^l^ 

adiuvant and injecting the mfacture or emulsion parenteral* (generally subcutaneously or intramuscularly). A dose of 

of the protein in saline, preferably using Freund's incomplete adjuvant. One may alternately generate antibodies by 
In So immunization using methods known in the art, which for the purposes of this invention is considered equrva.ent 

loi^pXcSantisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 
She Wood at 25-C for one hour, followed by incubating the blood at 4'C for 2-18 hours. The serum is recovered by 
centrifugation (e.g., 1 ,000xg for 10 minutes). About 20-50 ml per bleed may be obtamed >£rnrrtb£ 
[0165] Monoclonal antibodies are prepared using the method of Kohler and M.lstem, Nature 256. 49 « (1975). or 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the 
TZ to extract serum the spleen (and optionally several large lymph nodes) is removed and dissociated into single 
ce S desired, the spleen cells can be screened (after removal of nonspecificalry adherent cells) by applying , b , ca 41 
su pension to a plate or well, coated with the protein antigen. B-cells producing membrane-bound ^munoglobu.^ 
Sc for the antigen bind to the plate, and are not rinsed away with the rest of the suspension. Resulting B-cehs, or 
^dissociated spleen cells, are then induced to fuse with myeloma cells to form hybndomas and are cultured in a 
select^ medium (e g., hypoxanthine, aminopterin, thymidine medium, HAT"). The resulting hybndomas are plated by 
mrtinq dilution and a e assayed for the production of antibodies which bind specifically to the immunizing antigen 
an! ZZ- i do not bind to unrelated antigens). The selected Mab-secreting hybndomas are then cultured either ,n v.tro 
tea in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites in mice). 

0166 Other methods for sustaining antibody-producing B-cell clones, such as by EBV transformation are known. 

016* If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques. 
Surtable labels inc ude fluorophores, chromophores, radioactive atoms (particularly *P and 1*|>. electron-dense re- 
agent ym s a nd ]lg J s having specific binding partners. Enzymes are typically detected by their acuity For 
example, horseradish peroxidase is usually detected by its ability to convert 3,3',5,5--tetramethylbenzidine (TUB) to a 
blue pigment, quantifiable with a spectrophotometer. 

A.2 In Vitro Applications of Polypeptides 

[01681 Some polypeptides of the invention will have enzymatic activities that are useful in vitro For example, the 
soybean trypsin inhibitor (Kunrtz) family is one of the numerous families of proteinase inhibitors. ' compnses i plant 
pro'teins vTch have inhibitory activity against serine proteinases from the trypsin and sub tihsin ami Ph - 

ases and aspartic proteinases. Thus, these peptides find in vitro use in protein purification protocols and perhaps in 
therapeutic settings requiring topical application of protease inhibitors. K1/NOwn , hDe ic 
0169] Delta-aminolevulinic acid dehydratase (EC 42^ (ALAD) catalyzes the second step ,n 
of heme, the condensation of two molecules of 5-aminolevulinate to form porphobilinogen and is .atoc involve i ^chlo- 
rophyll biosynthesis(Kaczor et al. (1994) Plant Physiol. 1-4: 1411-7; Smith (1988) Biochem. J. 249: 423-8, Schneider 
Xt Z naturto sch [C] 31 : 55-63). Thus, ALAD proteins can be used as catalysts in synthesis of heme derivatives. 

generally can be used as cataiysts for in vitro synthesis of the compounds repre- 

and purify additional polypeptides that bind to them. This allows one to identify proteins that function as multimere , or 
e ,ucidatesign al transduction or metabolic pathways. Inthecase of ^^^^^SS^SS^^S 
» inasimilarmannertoid e ntifytheDNAdeterminantsof S pecificb,nd,ng(S.P,errouetal.,^ 

S. Chusacultanachai et al., J. Biol. Chem. 274:23591 (1999), Q. Lin et al., J. B,ol. Chem. 272:27274 (1997)). 

II. B . POLYPEPTIDE VARIANTS , FRAGMENTS, AND FUSIONS 

; [0171] Generally, variants , fragments, or fusions of the polypeptides encoded by the maximum length sequence 
lMLS) can exhibit at least one of the acuities of the identified domains and/or related polypeptides described m Sections 
(C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest. 
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II. B .(1) Variants . 
[0172] A type of variant of the native P^Pf^ 

SLLabove (see...), are preferred tomamtam h, tarn: £ acid residues within the 

conservation of charge, polarity, h y dro P hoblClty '^ of e ^^ S that acts as a functional equivalent, for example 
sequencecanbesubstituted with another ^^^^^ ^ m em^^^ 

providing a hydrogen bond in an *^X!E?£^£m** acid belongs. For example, the nonpo.ar 
are preferabiy made among the ^members ^ ^^J^ proline , phenyialanine, tryptophan and meth.o- 
(hydrophobic) amino acids include atari ne hm ^^J^ teine> tyrosin e, asparagine, and glutam.ne. 

have additional individual ammo acute or ammo c ^equence ^ ^ amjno ^ sequence8 

ssr 

L domain or ooneer»ad rasiduaa by a oodsa-vaUve of the In v» aotMaa. or structural .aaturaa ci to 

[0,76] vatar^arolaasotvanana^ 

ticular domain or group of conserved residues. 
II A (2) FRAGMENTS 

" ,0,77, 

Us o. .ha instan, invantion and ' «^"^ e ZXS>^ <" ^ ■*«•* f*— a " s °™- 

35 

||.A.(3)FUSIONS ^ thereof of 

,0,78, O. Warea, are ohlmaraa ^^^'^^^X^^^^ 
Moras, and (2) a tragman, ol a P°*PW» 2?3St pSn, *** oooprisas wo AP2 halloas. The , presan. 

proteins or fragments thereof. 
DEFINITION OF DOMAINS 
* [0,70, Thapo, y pap*as»«hainvan to ma yP os— ^ 

i. Jna wiJn ma M LS anoodad ^jg^Ji*..-™-*"** 
domain. w«hin th, MLS anoodad po*pap«de c«£ J ^ 0 , lhe domate 

(http//www.expasy.ch/prosite/), and Flam, 
(http//pfam. wustl.edu/browse.shtml). 

1 (AAA) AAA-protein family signature 
* ,0,01, Aia.a.amiivoi.TPaaaahaa^daao,,^ 
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containing two AAA domains: SEC18 

■ — sssss^ 5 - 

homooligomer composed ol s,x subumts. 

Pr0,i,erati0n - PAS1 essentiaitorperoxisomeassemblyandthere.atedprote.nPASIfromP.ch 
. Yeast protein PAS1, essential rorp tran sduction 

' connecting l*U0 cell » 

•o [0183] It is an integral membrane prote.n with a large cyt 

the protease domains. mitoch0 ndrial compartment. YME1 is 

protease domain in w w p. t , ep e„d.nl 

. . . ^niex of the 26S proteasome [6] wnicn is 
,0184] Subonitstromtlte regulatory complex ottne 

degradation o, abioui.inated protetns: ^ 
^ a )M amm*n^n 4 andnomd,o 9 ein««ni fl na,ea te ,o,a,,n y eaa, (9 aneV , 

&rj=^KS--- , lheRl95te ,_rp,d,a, 
Yeast protein SAP1. 

Consenaua pattern: IUVMWUVMDILIVMF1 ( ^ 
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209-224(1993). 

[5JConfalonieri R, Duguet M. BioEssays 17:639-650(1 995).[ 6] Hilt W., Wolt D.H. Trends Biochem. Sci. 21:96-102 
(1996). 

5 2. ABC Membrane (ABC transporter transmembrane region). This family represents a unit of six transmembrane 
helices. Many members of the ABC transporter family (ABC tran)have two such regions. See also descriptions of 
ABC_Tra n, below, and ABC2 membrane, above. 

3. (ABC Tran) 

10 

ABC transporters family signature 

[0187] On the basis of sequence similarities a family of related ATP-bindingproteins has been characterized [1 to 5]. 
These proteins are associated with avariety of distinct biological processes in both prokaryotes and eukaryotes, but a 

is majority of them are involved in active transport of small hydrophilic molecules across the cytoplasmic membrane. All 
these proteins share a conserved domain of some two hundred amino acid residues, which includes an ATP-binding 
site. These proteins are collectively known as ABC transporters. Proteins known to belong to this family are listed 
below (references are only provided for recently determined sequences).ln prokaryotes: - Active transport systems 
components: alkylphosphonate uptake(phnC/phnK/ phnL); arabinose (araG); arginine (artP); dipeptide (dciAD;dppD/ 

20 dppF); ferric enterobactin (fepC); ferrichrome (fhuC); galactoside (mgIA); glutamine (glnQ); glycerol-3-phosphate (ug- 
pC); glycine betaine/L-proline (proV); glutamate/aspatate (gltL); histidine (hisP); iron(lll) (sfuC), iron(lll) dicitrate (fecE); 
lactose (lacK); leucine/isoleucine/valine (braF/braG;livF/livG); maltose (malK); molybdenum (modC); nickel (nikD/ 
nikE); oligopeptide (amiE/amiF;oppD/oppF); peptide (sapD/sapF); phosphate (pstB); putrescine (potG); ribose (rbsA); 
spermidine/putrescine (potA); sulfate (cysA); vitamin B12 (btuD). - Hemolysin/leukotoxin export proteins hlyB, cyaB 

25 and IktB. - Colicin V export protein cvaB. - Lactococcin export protein IcnC [6]. - Lantibiotic transport proteins nisT 
(nisin) and spaT (subtilin). - Extracellular proteases B and C export protein prtD. - Alkaline protease secretion protein 
aprD. - Beta-(1 ,2)-glucan export proteins chvA and ndvA. - Haemophilus influenzae capsule-polysaccharide export 
protein bexA. - Cytochrome c biogenesis proteins ccmA (also known as cycV and helA). - Polysialic acid transport 
protein kpsT - Cell division associated ftsE protein (function unknown). - Copper processing protein nosF from Pseu- 

30 domonas stutzeri. - Nodulation protein nodi from Rhizobium (function unknown). - Escherichia coli proteins cydC and 
cydD. - Subunit A of the ABC excision nuclease (gene uvrA). - Erythromycin resistance protein from Staphylococcus 
epidermidis (gene msrA). - Tylosin resistance protein from Streptomyces tradiae (gene tlrC) [7]. - Heterocyst differen- 
tiation protein (gene hetA) from Anabaena PCC 71 20. - Protein P29 from Mycoplasma hyorhinis, a probable component 
of a high affinity transport system. - yhbG, a putative protein whose gene is linked with ntrA in many bacteria such as 

35 Escherichia coli, Klebsiella pneumoniae, Pseudomonas putida, Rhizobium meliloti and Thiobacillus ferrooxidans. - 
Escherichia coli and related bacteria hypothetical proteins yabJ, yadG, yagC, ybbA, ycjW, yddA, yehX, yejF, yheS, 
yhiG, yhiH, yjcW, yjjK, yojl, yrbF and ytfR.ln eukaryotes: - The multidrug transporters (Mdr) (P-glycoprotein), a family 
of closely related proteins which extrude a wide variety of drugs out of the cell (for a review see [8]). - Cystic fibrosis 
transmembrane conductance regulator (CFTR), which is most probably involved in the transport of chloride ions. - 

40 Antigen peptide transporters 1 (TAP1 , PSF1 , RING4, HAM-1 , mtpl) and 2 (TAP2, PSF2, RING 11 , HAM-2, mtp2), which 
are involved in the transport of antigens from the cytoplasm to a membrane-bound compartment for association with 
MHC class I molecules. - 70 Kd peroxisomal membrane protein (PMP70). - ALDP, a peroxisomal protein involved in 
X-linkedadrenoleukodystrophy [9], - Sulfonylurea receptor [10], a putative subunit of the B-cell ATP-sensitive potassium 
channel. - Drosophila proteins white (w) and brown (bw), which are involved in the import of ommatidium screening 

45 pigments. - Fungal elongation factor 3 (EF-3). - Yeast STE6 which is responsible for the export of the a-factor pherom- 
one. - Yeast mitochondrial transporter ATM1 . - Yeast MDL1 and MDL2. - Yeast SNQ2. - Yeast sporidesmin resistance 
protein (gene PDR5 or STS1 or YDR1). - Fission yeast heavy metal tolerance protein hmtl. This protein is probably 
involved in the transport of metal-bound phytochelatins. - Fission yeast brefeldin A resistance protein (gene bfrl or 
hba2). - Fission yeast leptomycin B resistance protein (gene pmdl). - mbpX, a hypothetical chloroplast protein from 

so Liverwort. - Prestalk-specific protein tagB from slime mold. This protein consists of two domains: a N-terminal subtilase 
catalytic domain and a C-terminal ABC transporter domain. As a signature pattern for this class of proteins, a conserved 
region which is located between the 'A' and the 'B' motifs of the ATP-binding site was used. 

[0188] Consensus pattern: [LIVMFYC]-[SA]-[SAPGLVFYKQH]-G-[DENQMW]-[KRQASPCLIMFW]-[KRNQSTAVM]- 
[KRACLVM]-[LIVMFYPAN]-{PHY}-[LIVMFW]- [SAGCLIVP]-{FYWHP}-{KRHP}-[UVMFYWSTA] The ATP-binding re- 
ss gion is duplicated in araG, mdl, msrA, rbsA, tlrC, uvrA, yejF, Mdr's, CFTR, pmdl and in EF-3. In some of those proteins, 
the above pattern only detect one of the two copies of the domain. The proteins belonging to this family also contain 
one or two copies of the ATP-binding motifs 'A' and B'. 
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Acyl-CoA-binding protein signature 

l0189] Acy.-CoA-binding protein (ACBP) is a -CJ^SSE I^EKl ET-X- 
22 wrth very high affinity and may " J £j£ ^ability to displace diazepam from the benzod, 

as diazepam binding inhibitor (P»> « 1 Receptor. It is therefore possible that this protein also acts 

serine (BZD) recognition site located on the GA °^ ll^m ACBP is a highly conserved protein of about 90 
Z neuropeptide to modulate the action of the * a , 9 s0 plated to the N-termina. section 

r^es/atLb^^^ 

of a probable transmembrane protein oi un«n |jan ACBP was selected. 

tm Schultz E R Todaro G.J. Proc. Natl. Acad. Sc.. U.S.A. 89:11287-11291(1992). 
ISS^i^M 49:325-344(1991). 



HypE, AIR synthases, FGAM synthase and se- 



5. (AIRS) 

AIR synthase related proteins 

,0190] This family includes Hydrogen expression/formation protein 
lenide, water dikinase. 

6. (AMP-binding) 

Putative AMP-binding domain signature 

[0191] ithasbeenshownnto.thatanumberofp^a— 
a'tr dependent covalent binding of AMPto*e,-u^^ 

Insects luciferase (luciferin 4-monooxygenase). unerase I'^IJ"^ (gene LYS2). This enzyme cata- 
p nee of ATP and molecular oxygen. - ^oZLe reduction of activated a.pha-ami- 

fyzes the activation of alpha-aminoad-pate ^^ P ^^ 
n 0 adipatebyNADPH.-Acetate--CoAl.gase(acety^ 

CoA from acetate and CoA. - Long-cha l n-fatty-acid--CoA I ga an J . 4 . couma rate-CoA ligase (4CL), a plant 
, SJ the synthesis of cellular lipids ^^^Z^^^«^^ the 

enzyme that catalyzes theformat.on of *™ m ™*^"™* lea <*w to various specific end product*- O-succ^yl- 
between general phenylpropanoid metabol.sm an pattiways ead g ^ ^ ^ of 

benzoic acid-CoA ligase (OSB-CoA synthetase) (gene , menE [6] a ba V ^ [?] g pseudomonas 

.enaquinone (vitamin K2). - ^^S^f^Si^ ligUe (.AA-Vsine synthetase) [8] 
s enzyme involved in the degradation of 4-CBA J"**J£ y . Bi|e acid . Co A ligase (gene baiB f rom Eubac 
Pseudomonas syringae that c ° nv ^^ formation of a variety of C-24 ^ acid-CoA- 

terium strain VPI 12708 [4]. This enzyme ^^^J^ coli (gen e caiC). - L-(a,pha- a m,noad, P yl)-L-cystei 
Crotonobetaine/carnitine-CoA ligase (EC 6.3.Z. ) uom e 



EP 1 033 405 A2 

a r nrhAB, This enzyme catalyzes the first 
, apv synthetase) 1rom various fungi (gene acvA or pco y nX amjn0 acids . The 

nyi-D-vaiine synthetase (ACV ^ 

ste p in the biosynthesis o pen ^ a ^ lt js a pro tein o ?™ d f™Z M s brevis. This enzyme cata- 
amino acids seem to be actuated by y ^ g ^ , g rsA ) f rom oa racemizatjon o1 pn eny- 

domains of about 1 0" anw»ac* d , * the ATP^P tycA js t0 that 

, yZ es the first step in the A) , rom Bacillus brev.s. The is a multifunctional prote>n 

la-anine-Tyrocid^ 

catalyzed by grsA - Gra m,c, ° ,n b ^ valjnei ornit hine and leucine. GrsB cons's ^ ^ invoWed 

that activates and porymer^s p o^ 
bactin synthetase componensE(jne )^^^ 
intheATP^ependentac^^^ 
biosynthesis. -Cyc^^^ 

contains three related domains wh«e four amjno aclds (Pr0 , L Ata_u domajns There are also 

Cochliobolus carbonum. ^^^^^u/m^- HTS1 ^ s f ab 0 ' f ° ^p-binding enzymes. These 
oxodecanoic acid) that make up HC .torn y ^ probably als^AM ^ ^ ^ 

5 some proteins, whose ex^ 

proteins are: - ORA (octapepttde^ropeat an g > _ ^ l^SSS^)^^ 

which showsahighdegreeo^^ 

t0 be a transcriptional angR is not a DNA-binding protein, ^ rattw anen^ ^ sjze 

tor operon. But it is *^ «^J^Ka*d on three facts: the pr"?^^ presence of a probable S 
?0 thesis of anguibactm "^^^^ any bacteria, transcnp «nal prf^and t , p pseudomonas 

anaR (1048 residues), which is far bigger _ hypot hetical protein in 9 hvp othetical proteir 

4 M*r,.e O.H.. ^^^T*. Mia**. 6:529-5460,992). 

- {3»^fe«CSS- - — - — ~ Lian9 P - H " 

7 Babbitt P.O., K«yon G.U ^5604(1992). 



are also described in Jofuku et ai., cop 
09/026,039. 



7. AP2 domain 
[01931 



[3] Mushegian AR, Koonin tv, 



8. ARID 



, d omainsharingstructuralhomologytoDNAre P .icationandrepair 
[019 4] TheAR.DdomainisanAT-Richlnteract.ondomainshanng 
nucleases and polymerases. 



Ceases and porymerases. Genes ^ 1995;9:3067 .3082. 
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9. (ATP synt) 



20 10. (ATP Synt A) 

45 

12. (ATP synt C) 
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I020 0] Consensus pattern: [ GSTA]-R-[NQ,-P-x( 1 0)- l LIVMFYW](2)-x(3)-[L.VMFYW]-x- [ DE] [D or E binds DCCD] 
[ 1] Futai M., Noumi T., Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
\ 4] Recipon H., Perasso R., Adoutte A, Quetier F. J. Mol. Evol. 34.292-303(1992). 

13. (ATPsynt DE) 

ATP synthase, Delta/Epsilon chain 

alent to the Oligomycin sensitive subunit (OSCP) in metozoans. 

14. (ATPsyntab) 

ATP synthase alpha and beta subunits signature 
[0202] ATPsynt ha se (P — 

the inner membrane of ™^™ d ™^^^ J ^ ^ ° fSTcalled coupling lactor CF(1). The former 
of an oligomeric transmembrane sector, called delta and epsilon. The sequences 

acts as a proton channel; the latter is composed of ft ^ e 8ubun ^^.^' 9 S ^^ TP and AD p T he beta chain has 
of subunits alpha and beta are related and both contain are responsible for 

catalytic activity, while the alpha chain is a regulatory subun ^^S^S^ t»y-l o gomeric complexes 

^Consensus pattern: P- [ SAP]- [ L,V]- [ DNH ] -x(3)-S-x-S [The first S is a putative actrve site residue] 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
to [ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

Oshima T. , Konishi J., Denda K., Yoshida M. Proc. Natl. Acad Scl ^JJSf^S' 
[5] Dreyfus G., Williams A.W., Kawagishi I.. MacNab R.M. J. Bactenol. 175.3131-3138(1993). 

ts 

15. (ATPsyntabC) 

ATP synthase ab C terminal. 

" SSLiSSS ZZSTbE a W*, uE; Structure a, U A resotution c, Ft-ATPase „c m bovine M ... 
chondria." Nature 1994;370:621-628. 

16. (A deaminase) 

55 

Adenosine and AMP deaminase signature 

[0205] Adenosine deaminase catalyzes the hydrolytic domination ofadenosine into inosine. AMP deaminase cat- 
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lhat are potential active ail. resid J« «" "J^ % „„„ M „. s .„ putative active a«. residues] 

17. (Acetyltransf) 
io Acetyltransferase (GNAT) family. 

15 18. (Aconitase C) 

Aconrtase family signature from the tri carboxylic acid cycie that catalyzes 

,02071 Acon ft ase(aconi^^^ product during the 

matrix and the other found in the cytoplasm. Aconrtase in rts a<f« shown tnat the aconrtase family 

" S12m j. Artymiuk P J . GaesU.R Trende •*« «* 

19. (Acyl-CoAdh) 
40 Acyl-CoA dehydrogenases signatures 

2 araTSSaTS the mitochondrion. They a a I ^m^tranta no ^ _ a|ulary|<ioA 

except VLCAD which is a dimar and which coHWw . hyd-oxylyeine and tryptophan. • leovaleh* 
IPC 1 3 99 7HGCDH). which is involved in the catabMismov awn. a, j > d6tiydttl98 „ases aeaA and 
„ S»Ade ydro^EClJ^H^ . Ee- 
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J 4 ]EichlerK.,BourgisF., BuchetA.^ 

20. (Acyl transf) 

Acyl transferase domain 

acylcarrierproteintransacylaseatlb 
1995;270:12961-12964. 

21 .Acylphosphatase signatures c o{ various acylphosphate carboxyl-phosphate 

bacterial and archebactenal hYPf ja coli hypothetical protem ycc* w as sjgnature pat- 

Consensus pattern: G-IFYW] [Avuj i 



22. (Adap comp sub) 

Clathrinadaptorcomplexesmediumchainsignatures. etrafficsU c h asreceptor mediated endocytos^ 

[02111 C,at h rinc 0 ate d vesic,es ( CC = — 
areKnownasadaptororclathnnas ^ 

with the cytoplasmic tails of membranep iated with the Golg. complex : an _ the adaptins . 

Iptorclp^^ 

Consensus pattern: [LIV]-x-F-l-P-P- ^ ^ 

[ 3] Nakayama Y., Goebl M., u 
569-574(1991). 
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23. (Adenylsucc synt) 
Adenylosuccinate synthetase signatures 

5 [0212] Adenylosuccinate synthetase (EC 6.3.4.4 ) [1] plays an important role in purinebiosynthesis, by catalyzing the 
GTP-dependent conversion of IMP and aspartic acid to AMP. Adenylosuccinate synthetase has been characterized 
from various sources ranging from Escherichia coli (gene purA) to vertebrate tissues. Invertebrates, two isozymes are 
present - one involved in purine biosynthesis and the other in the purine nucleotide cycle. Two conserved regions were 
selected as signature patterns. The first one is a perfectly conserved octapeptide located in the N-terminal section and 

io which is involved in GTP-binding [2]. The second one includes a lysine residue known [2] to be essential for the enzyme's 
activity. 

Consensus pattern: Q-W-G-D-E-G-K-G 

Consensus pattern: G-l-[GB]-P-x-Y-x(2)-K-x(2)-R [K is the active site residue] 

J5 [ 1] Wiesmueller L, Wittbrodt J., Noegel A.A., Schleicher M. J. Biol. Chem. 266:2480-2485(1991). 

[ 2] Silva M.M., Poland B.W., Hoffman C.R., Fromm H.J., Honzatko R.B. J. Mol. Biol. 254:431-446(1995). 
[ 3] Bouyoub A., Barbier G., Forterre P., Labedan B. 2.3.CO;2-"J. Mol. Biol. 261:144-154(1996) . 

24. (AdoHcyase) 

20 

S-adenosyl-L-homocysteine hydrolase signatures 

[0213] S-adenosyl-L-homocysteine hydrolase (EC 3.3.1.1 ) (AdoHcyase) is an enzyme of the activated methyl cycle, 
responsible for the reversible hydratation of S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHc- 
25 yase is anubiquitous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly conserved protein 
[1 ] of about 430 to 470 amino acids. Two highly conserved regions were selected as signature patterns. The first pattern 
is located in the N-terminal section; the second is derived from aglycine-rich region in the central part of AdoHcyase; 
a region thought to be involved in NAD-binding. 

Consensus pattern: [GSA]-[CS]-N-x-[FYLM]-S-[ST]-[QA]-[DEN]-x-[AV]-[AT]-[AD]-[AC]-[LIVMCG] 
30 Consensus pattern: [GA]-[KS]-x(3)-[LIV]-x-G-[FY]-G-x-[VC]-G-[KRL]-G-x-[ASC] 

[ 1] Sganga M.W., Aksamit R.R., Cantoni G.L., BauerC.E. Proc. Natl. Acad. Sci. U.S.A. 89:6328-6332(1992). 

25. AhpC/TSA family 

35 [0214] This family contains proteins related to alkyl hydroperoxide reductaseComment: (AhpC) and thiol specific 
antioxidant (TSA). 

[1] Chae HZ, Robison K, Poole LB, Church G, Storz G, Rhee SG, Proc Natl Acad Sci U S A 1994;91:7017-7021 

26. (Aldose epim) 

40 

[021 5] Aldose 1 -epimerase putative active site Aldose 1 -epimerase (EC 5. 1 .3.3) (mutarotase) is the enzyme respon- 
sible for the anomeric interconversion of D-glucose and other aldoses between their alpha- and beta-forms. The se- 
quence of mutarotase from two bacteria, Acinetobacter calcoaceticus and Streptococcus thermophilus is available [1 ]. 
It has also been shown that, on the basis of extensive sequence similarities, a mutarotase domain seem to be present 
45 in the C-terminal half of the fungal GAL10 protein which encodes, in the N-terminal part, for UDP-glucose 4-epimerase. 
The best conserved region in the sequence of mutarotase is centered around a conserved histidine residue which may 
be involved in the catalytic mechanism. 
Consensus pattern: [NS]-x-T-N-H-x-Y-[FW]-N-[LI] 

[ 1] Poolman B., RoyerTJ., Mainzer S.E., Schmidt B.F. J. Bacterid. 172:4037-4047(1990). 

50 

27. (AlkA DNA repair) 

Alkylbase DNA glycosidases alkA family signature 

ss [0216] Alkylbase DNA glycosidases [1 ] are DNA repair enzymes that hydrolyzes the deoxyribose N-glycosidic bond 
to excise various alkylated bases from a damaged DNA polymer. In Escherichia coli there are two alkylbase DNA 
glycosidases: one (gene tag)which is constitutively expressed and which is specific for the removal of 3-methyladenine 
(EC 3.2.2.20), and one (gene alkA) which is induced during adaptation to alkylation and which can remove a variety 



35 
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. . h = ro «nv reaion of sequence similarity. In yeast there is an 
of alkylation products (EC 3.2.2.21 ). Tag and ^^SbJine or 7-methyladenine and which » 

a DNA glycosidase (gene ^^^^^Sm^ acid residues. Whi.e the , C- anc £ 

m Lindah. T., Sedgwick B. Anna Rev. Bicchern ^^S^^go). 
28. Ammonium transporters signature 

" ^^^^^ 

, [mLkH Or Adv. Hrti FM* 

29. (Arch_histone) 
CBF/NF-Y subunits signatures 

fheir expression. It also recognizes the sequence CCAAT and . is y ^ ^ php3 jp f)SS)on yeast , , a 

Tu of CBF, known as CBF-A or NF-YB , r ^^^^^JIL™. domain of about ^ ^residue, Th, 
protein of 116 to 210 amino-acid res.duee ^M^d^N^tmUc^V^T^^^ 
Somain seems to be involved in DNA-b.nd,ng; a s.gnature pattern n ^ fjssjon ^ a pr0 ^ 

4£ subunit of CBF, known as CBF-B or NF-YAjn vert region of about 60 residues. Thfe reg.oa ca ed 
of 265 to 350 amino-acid residues wh.ch COTta,n ^^ C °' N . term inal subunit-association doma.n and a C-terminal 
^sentialcore,2 1 ,seems^ 

so S:r^^^^ 

1087-1091(1992). , 11611 -619(1991). 

[ 2] Olesen J.T., Fikes J.D., Guarente L. MoL Cell. Biol. 

" 30 Arqininosuccinate synthase signatures 

, „™m» that catalvzes the penultimate step in 
.u „ tcr r T 4 5H AS} is a urea cycle enzyme that catalyzes lMO " 
[0219] Argininosuccinate synthase (EC 6JA5) (AS) 
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^^eATP-dependentHgationofci— 
arginine biosynthesis, the ATP _aep « ^ citru|linemia , a genetic amin0 . aC id residues, 

phosphate [1 ,2].ln humans a delect >n me Ab J enzyme of chainS of about 400 ™" 

3 Cavallo R, Rubenstein D, Pe.fer M. Curr Opin^en 
[6] Peifer M, Wieschaus E, Cel. 1990;63:1167-1176. 
32. (Asn Synthase) 

- :r^:.--— 

aspartate to asparagme. 

33. Asparaginase_2 

35 

Asparaginase 12 members 

34. (Aspartyl tRNA N) 

4 o Aminoacyl-translerRNA synthetases class-l. signatures 

[ 21 DelarueM., Moras u.Dioc&aay MQE)1 , 
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[ 5] CusackS., Haertlein M.. Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 

members: 34 
[0223] 

lorm a pair of alpha helices. Number of members: 42 

37 Amino acid permeases signature 

' [0225] Arninoacidpermeasesa^ 

A number of such proteins have been found t be evo *m ™mino acid perm ease (gene ALP1). - Yeast 

amino acid permeases (genes GAP1, AGP2 and AGP3). «« ' CAN1) . Yeas , dicarboxylic amino acid permease 

Leu/VaVMepermeasefeeneBAP^ 
I (geneDIPS).-Yeastas P aragine/glu^ 

^permease (gene H.Pl).-Yeas^ 

valine and tyrosine permease (gene VAL1/TAT1 ). - Yeast uypupmnp w jn YKL174c . 

Tansport protein (gene HNM1/CTR1). - n^yeast hypothetical protein 

- Fission yeast protein isp5. - Fission yeast hypothetical proton! S P AOb« 1 aizbnum amino acid per- 

• Sp^1lD3.08c. P Emerice l .an W ulans P r O .inetranspo^ 

mease INDA1 . - Salmonella typhimurium L-asparag "^^^.^portar (gene cycA). - Escherichia coll 
transport protein (gene aroP). - Escherichia co i ?^"^^"^jf^^gggg /gg ne |ysP). - Escherichia coli phenylalanine- 
GABAVermease(genegabP).-Eschench.a^ Y) . Escherichia CO li 

specific permease (gene pheP^^ 

m and Kiebsielia pneumoniae hypothetical JJJ er , rts arginine or ornithine. These proteins 
S conT u S p 8 ; b 12 Z^^^sTs signatur^ for this family of proteins, the best conse.ed 

so jSJEEJn S ^VMST]-x(3)- [L.VMCTA H GA]-E-x(5)-[PSAL]- 

r 1] Weber E„ Chevalier M.R., Jund R. J. Mol. Evol. 27^1^0(1988). 

£5 

38 aakinase (1) Glutamate 5-kinase signature 

(02261 Glutamate ««_ fO^I tt—*-* « >«" enzyme that catalyzes the firsl step 
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in the biosynthesis of proline from glutamate, the ATP-dependent phosphorylation of L-glu tamate .nto L-glutamate 
ZZXE^ZJLfo (gene proB) and yeast [1] (gene PRG1), GK is a monofunct,onal protein while ,n plants 
!nn Sml it is a Afunctional enzyme (P5CS) [2]that consists of two domains: a N-termmal GK domain and a C- 
e"^ do-ain (EC 1JLM1) (see <PDOC00940».As a 

5 highly conserved glycinLnd aLne-rich region located in the central section of these enzymes has been selected. 
Yeast hvDOthetical protein YHR033W is highly similar to GK. 

r 11 Li W Brandriss M C. J. Bacteriol. 174:4148-4156(1992). 

[ 2] Hu C :-aX DeLney A.J., Verma D.P.S. Proa Natl. Acad. Sci. U.S.A. 89:9354-9358(1992). 
aakinase (2) Aspartokinase signature • 

r02271 Aspartokinase (EC 2 7 2.4 ) (AK) [1 ] catalyzes the phosphorylation of aspartate. The product of this reaction 
,5 Sen t' used Thibto^sis ol lyine or in the pathway leading to homoserine, which participates ,r , the 
btosynthesis of threonine, isoleucine and methionine. 'Escherichia coli, there are three different^ 
ilLsensitrvitytorepressionandinhib^ 

Lnzymes which both consist of an N- terminal AK domain and a C-terminal homosenne dehydrogenase dmmAKI 
s inged in threonine biosynthesis and AK2. in that ^ methionine. The third ,o = 
20 tional and involved in lysine synthesis. In yeast, there is a s.ngle isozyme of AK (gene HOM3). As a signature pattern 
for AK a conserved region located in the N-terminal extremity has been selected. 
Consensus pattern: [LIVM]-x-K-[FY]-G-G-[ST]-[SC)-[LIVM]- 
[ 1] Rafalski J.A., Falco S.C. J. Biol. Chem. 263:2146-2151(1988). 

25 aakinase (3) Gamma-glutamyl phosphate reductase signature 

[02281 Gamma-glutamyl phosphate reductase (EC 1.2.1.41) (GPR) is the enzyme that catalyzes the second step in 
he biosynthesis of proline from glutamate, the NADP-dependent reduction of L-glutamate 5-phosphate into L-giuta- 
ll^Xto and phosphate. In eubacteria (gene proA) and yeast [1 ] (gene PR02), GPR .s a monofu-ictiona 
so pro ein in plants and mammals, it is a bif unctional enzyme (P5CS) [2]that consists of two domains: a N-te^nal 
nlmate 5 kinase domain(EC 272J1 ) (see <PDOC00701 >) and a C-terminal GPR doma.n. As a signature pattern, 
fc^ 

Consensus pattern. V-x(5)-A-[LIV]-x-H-l-x(2)-[HY]-[GS]-lST]-x-H-[ST]-[DE]-x- 1- 

35 niPearsonBM Hernando Y, Payne J., Wolf S.S., Kalogeropoulos A., SchweizerM. Yeast 12:1021-1031(1996). 

1 2] Hu C A.A^elauney A. J., Verma D.P.S. Proc. Natl. Acad. Sci. U.S.A. 89:9354-9358(1992). 

39. (abhydrolase) alpha/beta hydrolase fold. This catalytic domain is found in a very wide range of enzymes. 

40 [0229] [1] Ollis DL, Cheah E, Cygler M. Dijkstra B, Frolow F, Franken SM. Harel M, Remington SJ, Silman I, Schrag 
J, Sussman JL, Verschueren KHG, Goldman A, Protein Eng 1992;5:197-211. 

40. (Acid phosphat) Histidine acid phosphatases signatures 

45 r02301 Acid phosphatases (EC 3. 1 .3.2) are a heterogeneous group of proteins that hydrolyze phosphate esters, 
o^fmLy a low P H It has been shown [1] that a number of acid phosphatases, from both promotes and eukaryotes 
sHt regtons of sequence similarity, each centered around a conserved histidine residue. These two histd nes 
seem involved in the enzymes' catahytic mechanism [2,3]. The first histidine is located ,n the N-termina. secUon 
and toL a phosphohistidine intermediate while the second is located in the C- terminal section and ^possibly acts as 

50 proton Jonor Enzymes belonging to this family are called 'histidine acid phosphatases' and are hsted below. 

- Escherichia coli pH 2.5 acid phosphatase (gene appA). 

- Escherichia coli glucose-1 -phosphatase (EC 3.1.3.10) (gene agp). 

- Yeast constitutive and repressive acid phosphatases (genes PH03 and PH05). 
55 - Fission yeast acid phosphatase (gene phol ). 

- Aspergillus phytases A and B (EC 3.1 .3.8) (gene phyA and phyB). 

- Mammalian lysosomal acid phosphatase. 

- Mammalian prostatic acid phosphatase. 
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• wv*i7 C05Cl0.1,C05Cl0.4andF26C11.1. 

at1pm[LlVM1 . x( 2)-[LIVMAl-x(2)-[LIVM]-x-R-H-lGN]-X-R x [PAS] 
[0231] Consensus pattern[LIVMlxwL vf2USTAl [H is an ac- 

[2]Ostanin K., Harms E.H., Stevis r.c 

41 . McnMM lamily signatures „ tt . rlJ<)W lic acid cycle that calalyzes 

K .avarsibla isonwization d ot acontt.se a,. Known >° ™ J° ^" 0 „ ur c i„„e,; M. 

thai binds to iron-responsr.e elements ^ receptormRNA. IRE BP ^ ^ 

— ^rr 

^,&.^w.«^ iRTi -* Hid,,,a " eiaw " n ' 

42. Actins signatures euka ryotic cells. In vertebrates 

ponents of ^^£S^f^£or» such as cytoplasmic "'j^,* a polymerized lorm (F-actin). 

40 probably involved m a vane y sf tunc jn & monomerlc1orrn (G-actin) or n p > v . g g protein 

„ ThefitstMoa.. « 6, ° 118 a * 5 ' 

JC(l „ ) „,as3,dBd, lto ,«cadea*P,e 8 sL 1 a L o^, ( .«. 
B , „ shetedine P.. Clayton J\ ^^itWW 
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[ 4] Rubenstein P.A. BioEssays 12:309-315(1990). 

[ 5] Meagher R.B., McLean B.G. Cell Motil. Cytoskeleton 16:164-166(1990). 
43. Adenylate kinase signature 

[0234] Adenylate kinase (EC 2.7.4.3 ) (AK) [1] is a small monomeric enzyme that catalyzes the reversible transfer of 
MgATP to AMP (MgATP + AMP = MgADP + ADP).ln mammals there are three different isozymes: - AK1 (or myokinase), 
which is cytosolic. - AK2, which is located in the outer compartment of mitochondria. - AK3 (or GTP:AMP phospho- 
transferase), which is located in the mitochondrial matrix and which uses MgGTP instead of MgATP. The sequence of 
AK has also been obtained from different bacterial species and from plants and fungi. Two other enzymes have been 
found to be evolutionary related to AK. These are: - Yeast uridylate kinase (EC 2.7.4.-) (UK) (gene URA6) [2] which 
catalyzes the transfer of a phosphate group from ATP to UMP to form UDP and AD P. - Slime mold UMP-CMP kinase 
(EC 2.7.4.14) [3] which catalyzes the transfer of a phosphate group from ATP to either CMP or UMP to form CDP or 
UDP and ADP. Several regions of AK family enzymes are well conserved, including the ATP-binding domains. The 
most conserved of all regions have been selected as a signature for this type of enzyme. This region includes an 
aspartic acid residue that is part of the catalytic cleft of the enzyme and that is involved in a salt bridge. It also includes 
an arginine residue whose modification leads to inactivation of the enzyme 
Consensus pattern: [LIVMFYW](3)-D-G-[FYI]-P-R-x(3)-[NO]- 

[ 1] Schulz G.E. Cold Spring Harbor Symp. Quant. Biol. 52:429-439(1987). 

[ 2] Liljelund P., Sanni A, Friesen J.D., Lacroute F. Biochem. Biophys. Res. Commun. 165:464-473(1989). 
[ 3] Wiesmueller L, Noegel A.A., Barzu Q, Gerisch G., Schleicher M. J. Biol. Chem. 265:6339-6345(1990). 
[ 4] Kath T.H., Schmid R., Schaefer G. Arch. Biochem. Biophys. 307:405-410(1993). 

[0235] 44. (adh_short) Short-chain dehydrogenases/reductases family signature. 

[0236] The short-chain dehydrogenases/reductases family (SDR) [1 ] is a very large family of enzymes, most of which 
are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized 
was Drosophila alcohol dehydrogenase, this family used to be called [2,3,4]'insect-type', or 'short-chain' alcohol dehy- 
drogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently 
known to belong to this family are listed below. - Alcohol dehydrogenase (EC 1.1.1.1 ) from insects such as Drosophila. 

- Acetoin dehydrogenase (EC 1.1.1.5 ) from Klebsiella terrigena (gene budC). - D-beta-hydroxybutyrate dehydrogenase 
(BDH) (EC 1.1.1.30 ) from mammals. - Acetoacetyl-CoA reductase (EC 1.1.1.36 ) from various bacterial species (gene 
phbB or phaB). - Glucose 1 -dehydrogenase (EC 1.1.1.47 ) from Bacillus. - 3-beta-hydroxysteroid dehydrogenase (EC 
1.1.1.51) from Comomonas testosteroni. - 20-beta-hydroxysteroid dehydrogenase (EC 1.1.1.53 ) from Streptomyces 
hydrogenans. - Ribitol dehydrogenase (EC 1.1.1.56 ) (RDH) from Klebsiella aerogenes. - Estradiol 17-beta-dehydro- 
genase (EC 1.1.1.62 ) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69) from Gluconobacter oxydans (gene 
gno). - 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100 ) from Escherichia coli (gene fabG) and from plants. - 
Retinol dehydrogenase (EC 1.1.1.105 ) from mammals. - 2-deoxy-d-gluconate 3-dehydrogenase (EC 1.1.1.125 ) from 
Escherichia coli and Erwinia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140 ) from 
Escherichia coli (gene gutD) and from Klebsiella pneumoniae (gene sorD). - 1 5-hydroxyprostaglandin dehydrogenase 
(NAD+) (EC 1.1.1.141 ) from human. - Corticosteroid 11-beta-dehydrogenase (EC 1.1.1.146 ) (11 -DH) from mammals. 

- 7-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.159 ) from Escherichia coli (gene hdhA), Eubacterium strain VPI 
12708 (gene baiA) and from Clostridium sordellii. - NADPH-dependent carbonyl reductase (EC 1.1.1.184 ) from mam- 
mals. - Tropinone reductase-l (EC 1.1.1.206 ) and -II (EC 1.1.1.236 ) from plants. - N-acylmannosamine 1 -dehydroge- 
nase (EC 1.1.1.233 ) from Flavobacterium strain 141-8. - D-arabinitol 2-dehydrogenase (ribulose forming) (EC 
1-1 -1-250) from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - Pteridine re- 
ductase 1 (EC 1.1.1.253 ) (gene PTR1) from Leishmania. - 2,5-dichloro-2,5-cyclohexadiene-1,4-diol dehydrogenase 
(EC 1.1.-.-) from Pseudomonas paucimobilis. - Cis-1,2-dihydroxy-3,4-cyclohexadiene-1-carboxylate dehydrogenase 
(EC 1.3.1. -) from Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). - Biphenyl-2,3-di- 
hydro-2,3-diol dehydrogenase (EC 1 .3.1 .-) (gene bphB) from various Pseudomonaceae. - Cis-toluene dihydrodiol de- 
hydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 1.3.1.19 ) 
from Pseudomonas putida (gene bnzE). - 2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28 ) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (genedhbA). - Dihydropteridine reductase (EC 1.6.99.7 ) (HDHPR) from 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropine synthesis reductase from 
Agrobacterium plasmids (gene masl). - Versicolorin reductase from Aspergillus parasiticus (gene VER1). - Putative 
keto-acyl reductases from Streptomyces polyketide biosynthesis operons. - A trifunctional hydratase-dehydrogenase- 
epimerase from the peroxisomal beta-oxidation system of Candida tropicalis. This protein contains two tandemly re- 
peated 'short-chain dehydrogenase-type' domain in its N-terminal extremity. - Nodulation protein nodG from species 
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in the biosynthesis of D- alanyl-lipot^hoic determin ation protein TASSELSEfc^ ^ tnetica | 

adipocyte protein P 27. - Mou«^nKe 6. Ma.z ^ ^ p6 A ^"W^eJKw^ 

grina 25 Kd development specie prote n_Drc P ^ hypothec! prate,nycK_ Eactenc _ ^ 
P-einen-dedinthe — 
icalprote.nydlG.-Esche^ 

„ P sine has been selected ae a stgnater. pan , P0H SAGFY W VM- 

, STAGDWMUV^XOHUVMFVWGAP j e ffe(y J , Ghosh D Biochemistry 34: 

, „ j08m ,a„ H., Pers- B.. Krook M., Aldan * ■ «— ~* ' 
6003*013(1935). „ H Eur . j. Biochem. 180:191-197(1939). 

2S mown to be NAD- o, NA D P-d.pendent 0»0 be ^ ^ .^^oct-We. or *£<^ |n8 currenlly 

phbB or P haB). - Glucose 1-de ^^f^i^xysteroid dehydrogenase (EC llJJgt 17 . beta ^ e hydro- 

Escherichia coli (gene gutD) corticosteroid 11-beta-dehydrogenase f^lii^ < sUajn VPl 

(NAD+) (EC HL141) from human. Oomcw Esch erich.a col. (gene ndnA »_^ UDa g4) 1rom mam . 

- 

4S S3STl (EC liiaS) l»« ™2 - Cis-1 ^hydw-M^lobexaO enj-v carboWB _ ^ 

cherichia coli (gene entA) and BKB" Pseu domonas paucimobilis. Agrop ne y e 

■ •** SSSST E5*> reductase,™ ^^^Z^^^ 
Agrobactenum P asm ^ 9 °™ ^ ' s polyketide biosynthesis operons. -_ A««»"^ Jon ta] „ s wo ,e- 
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in the biosynthesis of D- a.anyl-.ipoteichoic acid. - Human fo.licular variant '££1'^ 
adipocyte protein P 27. - Mouse protein Ke 6. - Maize sex determ.nat.on protein TASSELSEED 2. - Sarcophaga i pere 
gM a 25 Kd development spectfic protein. - Drosophila fat body protein P6. - A Ustena monocytogenes hypothe^cal 
protein encoded in the internals gene region. - Escherichia coli hypothetical prote.n yc.K. - Eschench.a col, hypothet- 
S protein ydfG - Escherichia coli hypothetical protein yjgl. - Escherichia coli hypothet.cal prote.n y) gU - Eschench* 

d K he S protein yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical prote,r , ywfD. 

BacLsubtilis hypothetical protein ywfH. - Yeast hypothetical protein YIL124W. - Yeast hypothetical prote n Y R036c 
-Teast hypothetical protein YIR036c. - Yeast hypothetical protein YKL055C. - F,ss,on yeast hypothet^l prote.n 
SPAC23D3 11. One oHhe best conserved regions which includes two perfectly conserved resdues a tyrosine and a 
lysine has been used as a signature pattern for this famify of proteins. The tyrosine resriue part.c,pates in the catalytic 

Consensus' pattern: [L IVSPADNK]-x(12)-Y-[PSTAGNCVl-[STAGNQC.VMHSTAGC]- K- {PC}-[SAGFYR]-[LIVM- 
STAGD]-x(2)-[LIVMFYW]-x(3)- [LIVMFYWGAPTHQ]- [GSACQRHM] IY is an active site residue] 

[ 1] Joernvall H., Persson B„ Krook M., Atrian S., Gonzalez-Duarte R., Jeffery J., Ghosh D. Biochemistry 34: 

6003 6013(1995) 

[2] Villarroya A., Juan E„ Egestad B, Joernvall H. Eur. J. Biochem. 180:191-197(1989). 
[ 31 Persson B Krook M„ Joernvall H. Eur. J. Biochem. 200:537-543(1991). 

| A) NeldTe E.L.',' Harnett C, Ornston N.L., Bairoch A., Rekik M., Harayama S. Eur. J. Biochem. 204:11 3-120(1 992). 

[0238] 46. (adh zinc) Zinc-containing alcohol dehydrogenases signatures Alcohol dehydrogenase (EC X_^L1) 
ADH catalyzes the reversible oxidation of ethanoltoacetaldehyde with the concomitant reduct.ono N AD [ Xurren^ 
three structurally and catalytic^, different types of alcohol dehydrogenases are known: - ^^ontanrng long<ha.n 
alcohol dehydrogenases. - Insect-type, or 'short-chain' alcohol dehydrogenases. - lron-conta,n,ng alcohol dehydroge- 
nase^ ztc ToS ADH's [2,3] are dimeric or tetrameric enzymes that bind two atoms of zinc per subunrt. One of 

or hSine residues; the catalytic zinc is coordinated by two cysteines and one histidine. Z.nc^onta.n.ng ADH s are 
°ound in bacteria mammals, plants, and in fungi. In most species there are more than one .sozyme (for example, 
human have at least six isozymes, yeast have three, etc.). A number of other zinc-dependent dehydrogenases are 
c loTely related to zinc ADH [4], these are: - Xylito. dehydrogenase (ECI^) (D-xylulose reductase). - Sofcrtol de 
hvdroaenase (EC 1 1 1 14 ) - Aryl-alcohol dehydrogenase (EC 1.1.1.90 ) (benzyl alcohol dehydrogenase). - Threonine 
S*o^ (EC1.1.1.195) (CAD)[5], CAD is a plant enzyme 

fnvolved in the ^o ynth^s^gnin. - Ga.actrto.-1 -phosphate dehydrogenase (EGJLL18H). - P ^f^J^. 
5-exo alcohol dehydrogenase (EC 1.1.1,) [6]. - Escherichia coli starvation sensing protein rspB - Escherichia co 
; hypothetical protein yjgB. -EscherichiacolihypotheticalproteinyjgV. -Escherichiaco.ihypothet.ca.prote^yjjN. - Yeast 
hypothetical protein YAL060w (FUN49). - Yeast hypothetical protein YAL061W (FUN50). - Yeast hypothetical pro e.n 
YCR105W. The pattern that has been developed to detect this class of enzymes is based on . conserved reg.on tinat 
includes a histidine residue which is the second ligand of the catalytic zinc atom. Th.s fam.ly also .ncludes NADP- 
dependent quinone oxidoreductase (EC 16_S5),an enzyme found in bacteria (gene qor). .n yeast and n mamma s 
j ° h P e re in some species such as rodents, it has been recruited as an eye lens protein and is known as zeta-crystallm 
p The sequencTof quinone oxidoreductase is distantly related to that other zinc<ontaining alcoho, dehvdrogena^ 
and it lacks the zinc-ligand residues. The torpedo fish and mammlian synaptic vesicle membrane protein vat-1 .s related 
to qor A specific pattern has been developed for this subfamily. 
Consensus pattern: G-H-E-x(2)-G-x(5)-[GA]-x(2)-[IVSAC] [H is a zinc ligand] 
5 Consensus pattern: [GSD]-[DEQH]-x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-Q-x(2)-[KR]- 

[ 1] Branden C -I , Joernvall H., Eklund H., Furugren B. (In) The Enzymes (3rd edition) 11:104-190(1975). 
[ 2] Joernvall H„ Persson B, Jeffery J. Eur. J. Biochem. 167:195-201(1987). 
[ 3] Sun H.-W., Plapp B.V. J. Mol. Evol. 34:522-535(1992). 

[ 4] Persson B, Hallborn J„ Walfridsson M., Hahn-Haegerdal B., Keraenen S„ Penttilae M., Joernvall H. FEBS 
Lett. 324:9-14(1993). 

[ 5] Knight M.E., Halpin C, Schuch W. Plant Mol. Biol. 19:793-801(1992). 
6 Koga H., Aramaki K, Yamaguchi E., Takeuchi K, Horiuchi T, Gunsalus I.C. ^BactenoUe 6 . 089 -1095(1986 . 
1 7] Joernvall H., Persson B„ Du Bois G., Lavers G.C., Chen J.H., Gonzalez P., Rao P.V., Zigler J.S. Jr. FEBS Lett. 
;5 322:240-244(1993). 

[02391 47 (aldedh) Aldehyde dehydrogenases active sites 

[o240] Aldehyde dehydrogenases (EC1.2.1.3 and EC 12JJ5) are enzymes which ox,d*e a wide vanety of ahphatic 
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phate dehydrogenase (EC12JJ0; •= semia , de hyde into succinate. - Eschencnia con km y _ scherichia 

^^^^^^^^^ 



Consensus pattern: [FYLVA]-x(3)-G iut| 

terminates androgen action by «««8 P of taglandl ns H2 and D2 to F2 alpha Qnas 

synth ase (PCILJJBB ^ apple. - Morphine 6-dehydrogenase t &£UJW ^ 

phate dehydrogenase (ECU^l Cnlordec one reductase (EC 1 -1.1.22§ ) wnicn rw catalyzes 

« putida plasmid P MDH7.2 (gene .morAV ChhtfO eto . D . gluconic acid reductase (EC J w ^ 

Escherichia col. hypothetical protein yaw. hyp0thet ical protein YJR096w.Thes * proi 

5S YBR149w,Yeasthypoth e V,capr^ 

ssas^- 2 ^ 
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residue]- 

f 11 Bohren K M Bullock B., Wermuth B., Gabbay K.H. J. Biol. Chem. 264:9547-9551(1989). 
[ 2] bTucTn C Willey D.L., Coulson A.F.W., Jeffery J. Biochem. J. 299:805-811(1994). 

T02421 49 Aloha amylase This family is classified as family 13 of the glycosyl hydrolases. The structure is an 8 
Ked a!phTe"baVre., interruptedly a -70 a,, calcium-binding domain protruding between beta strand 3 and 
ataha helix 3 and a carboxyl-terminal Greek key beta-barrel domain. 

M Larson SB Greenwood A, Cascio D, Day J, McPherson A, J Mol Bio. 1994;235:1560-1584. 
0243] 50. Aminotransferases class-l pyridoxal-phosphate attachment site . nfinripnt enzvmes sucn as 

Aminotransferases share certain mechanistic features with other pyndoxal- phosphate dependent enzymes such as 

SoSe svnthS(EC 44114) (ACC synthase) from plants. ACC synthase catalyzes the first step .n ethylene 
^^^M^^^ cobC ' is inv0lV6d in COba ' amin bios y nth8SiS ' - ^ast hypolhencal 
proSvi 

ridoxal-P attachment site] 

MlBairoch A. Unpublished observations (1992). u u - ♦ „ n ^ mn tnA HinuehiTSoda 

[ 2] Sung M.H., Tanizawa K., Tanaka H., Kuramitsu S., Kagam,yama H., Hirotsu K.. Okamoto A., Higuch, T, Soda 
K. J. Biol. Chem. 266:2567-2572(1991). 

r0244] 51 . Aminotransferases class-ll pyridoxal-phosphate attachment site such 
Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 

SlTi^ catalyzes an intermediate step in the biosynthesis of biotin: the addition of J«*-^2S 
,o alanine o form 8-amino-7-oxononanoate. - Histidinol-phosphate am.notransferase **2SS 
he eighth step in histidine biosynthetic pathway, the transfer of an ammo group from f<«^*^^^ 
phosphatetog^ 

from veast (oenes LCB1 and LCB2), which catalyzes the condensation of palmitoyl-CoA and serine to form 3-ketosph 
SnT^uence around the pyridoxal-phosphate attachment site of this Cass of enzyme ,s suff,c,ently con- 
served to allow the creation of a specific pattern 

Consensus pattern: T-[LIVMFYW]-[STAG]-K-[SAG]-[LIVMFYWR]-[SAG]-x(2)-[SAG] 

[K is the pyridoxal-P attachment site]- 

[ 11 Bairoch A. Unpublished observations (1991). 

ro945l 52 Aminotransferases class-Ill pyridoxal-phosphate attachment site 

Stransfe^ses share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, sue as 
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yielding glutamic-5- semi-aldehyde and Jono- and diamines, and pyruvate. It plays 

which catatyzes transamination b = 

a pivotal role in omega ammo ac ds ™taW»m. am W alpna . ke toglutarate, yielding succinate sem.al- 
s nase), which cataryzes the transfer of an ammo group from G ABA to p a whjch catgfyzes 

dehyde and glutamic acid. " DAPA ammotrans ^fe rase ^JJ^ J JS^^^^acB^lofom, 
an intermediate step in the biosynthesis of biot n he tra " s «™'^° n 1 64)] a Pseu domonas cepacia en- 

7,8- diaminopelargonic acid (DAPA). - 2*^g* ^^^^^^^p^lod^^ 
zyme(g e nedgdA)thatca te ly Z esthedecarboxylat.ngar^no^ 

,o aLne%nd carbon dioxide. -^^^ 

involved in the second step of W h V™ b ™^°^ ^f a . e de K min0 levulinic acid. - Bacillus subtilisaminotrans- 

order structure. 

Ml A HolakTA, FEBS Lett 1997;401:127-132. 

[2] Lux SE, John KM, Bennett V, Nature 1990;345:736-739. 

S3 

following enzymes: 

• . IPC 2 6 1 42} (transaminase B), a bacterial (gene ilvE) and eukary- 

matt, to torn, leucine ffj^'^f^ A erayme catalyzes the transler ot the amino group 

zoate (PABA) and pyruvate. 
* [0M „ T h eaO» e e ra v m esa,ep,o,ela S o r:r ~^ 

hi Green J.M., Merkel W.K.. Nichols BP. J Bacteriol. 174:5317-5323(1992). 
12] Bairoch A. Unpublished observations (1992). 

so l02 501 55 A m inotransferas^ 

Aminotransferases share certain mec ^^^^^^aue. On the basis of sequence similarly, these 
the covalent binding of the pyndoxal- phosphate group to e lys me r ^ rf ^ {Ql|ow|ng 

variousenzymescanbegrou P ed[1,2]intosu« 

enzymes: - Phosphoserine aminotransferase It h required both in the major phos- 

55 ofphosphoserineand2-oxog, = 

phorylated pathway of serine biosynthesis and in Wnwmwj phosphoserine ammotrans- 

sirrl to a rabbit endometrial P^^^^^^?^ sgaA) from Methylobacterium ex- 
ferase [3]. - Serine-glyoxylate aminotransferase (EC 23±u>) t^a ) w 
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This enzvme also acts as an alanine-glyoxylate ami- 
torquens. - Serine-pyruvate aminotransferase PCJJLLH). T*JJ ^ Qf mjtochondria . . | sope nicillin N 

notransferase (EC 2^). In vertebrates, i is ocated nJJ ^ P e ™ |osporin antibiotics and catalyzes the 
epimerase (gene cetD). This enzyme is .**m n he b^syn hesisof P P fjxation operon of some 

reversible isomerization ot isopemcill.n N and P 6 "'® 1 ' 1 " "~ ^nown a highly similar protein has been found in fungi 

x-[LIVMFYSAC] [K is the pyridoxal-P attachment srte]- 

r 1] Ouzounis C, Sander C. FEBS Lett. 322:159-164(1993). 

[0251] 56. Annexins repeated domain signature . ^ membranes . They bin d to 

Annexins [1 to 6] are a group of °*»™^ 

phospholipid bilayers in the presence of in cytoskeletal interactions, phospholipase inhi- 

for acidic phospholipids. Annexins have been c J^^^to, E J h of these proteins consist of an N-terminal 
bition, intracellular signalling, «^^;^^^a^S^ segment of sixty one residues. The repeat 
domain of variable length followed by four or eight copies ,dt a * wound jnto a right . han ded superhe.ix 
(sometimes known as an 'endonexin fold) consiste d "J^J^^.^ I (Lipocortin 1) (Calpactin 2) (p35) 
7].The proteins known to belong to ^f^^^^!^) (Chromobindin 8). - Annexin III (Lipocortin 
Chromobindin 9). - Annexin II (^",2^^?^^^^ 4 >' " AnneXin V (LipOCOrtln 5) 
3) (PAP-HI). - Annexin IV (Lipocortin 4) (Endonex^ ) J» chromobindin 20) (p68) ( P 70). This 

exin 2) (VAC-alpha) (Anchorin Cll) (PAP-I).- Annexin VI (Lipocorhn 6) (P i _ Annexjn m (Nfescular 

is the only known annexin that <^*^£%^*^ * ,ram ° rOSOphi ' a - " *' 
anticoagulant-beta) (VAC-beta). Xl.l (Intestine-specific annexin) (ISA).The 

— ^^^^ 

C^suT^^^ x(7)- [ L,V M F l -x(3HLIVMF l -x(11)- 
[LIVMFA]-x(2)-[LIVMFl- 

5 Klee C.B. Biochemistry 27:6645-6653(1988). 
[ B] Fiedler K„ Simons K. Trends Biochem. Sc.. 20.177-178(1995). 

[0252] 57. (arfj ) ADP-ribosylatton ™™™»J& ^ eins involved in pro tein trafficking. They may mod- 
45 ADP-ribosylation factors (ARF) [1 2.3,4] I are ,20 £ GTP***na ARF'salso act as aLteric activators of cholera toxin 
ulate vesicle budding and uncoatmg within the Golg. *PP*^£^ t jn all eukaryot es. At least six forms of 
ADP-ribosy.transferase activity. They are evo^ona^ 

ARF are present in mammals and three in bu ^^^l^Z^ as ARL's (ARF-like).ARDI is a 64 Kd 
but which lack the cholera toxin cofa ^r activrty they are o^veV know 
so mammalian protein of unknown biological function th* cont n an AR GTp _ bjndj prQteins [5] , but they are 
from the ARF family are generally included in ^^fj^™ RAS prote ins in that they lack cysteine residues 
only slightly related to the other RAS f^J^^^^^^^ myristoylated (the ARLs have 

55 has been selected as a signature pattern. x(2VG . x(2V[L |VM]-x(2)-[GSA]-[LIVMF]-x-[WK]-[LIVM]- 
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{ „ Boman A.L., Kahn R.A. Trends ^ ^^^ 
2 Moss J., Vaughan M. Cell. Signal 4.36 45 . 47 . 65(1 993) . 

[0253] (arf_2) ATP/GTP-binding site motH fe jt has been shoW n [1 ,2,3,4,5,6] that an appreciable 
From sequence comparisons and ^ sXa ^ a ^^^ ™e or less conserved sequence motifs. The best 
proportion of proteins that bind ATP or OTP sh ™*^"*ZnsatexMe loop between a beta-strand and an 
conserved of these motifs is a ^^J^S^^^ nucle0tide - ™ S " Tn* in 

alpha-helix. This loop interacts wrth one of the ^^™^ Qre are numerous ATP- or GTP-bind,ng proteins in 
re!erred to as the 'A' consensus sequence [1 ] or the P -^]^^ ence of such moti1 has been 

which the P-loop is found. A number of prote.n families for wh ™ ™ re^va ^ _ _ K|nesin 

1 1^^ ^ * ynamin ,, e pro t e ins (see 

heavy chains and kinesin-.ike ^ ^^^^^ kinase (see <f DOC^>). - Thym^late 
<PDOC00362>). - Guanylate kinase (see <^0£gZ2>^ V Nit rogenase iron protein family (n.fHrfrxC) 

kSels^PDOC0ip34>). - Shikimate kinase (see ^PJ^ng^ ^ [7] (see < PDOC00185 > . 

(see <PDOC00580».-ATP-binding proteins involved ,n actwe l^spoM J EF-G, EF-2, etc.). - Ras family 

of GTP-binding proteins (Ras, Rho, Rab, Ral, YptJ, SEC4 exc.). i K DQ£Q o 7 71 >) . . Bacterial recA protein 

■losyXfaL P rsfami,y(see<PDO^ 

(see <PDOC00131>). - Bacterial recF prote.n (see <£B£E9^>> s £ <PD oc00388 >). - Bacterial type I 

units ( Gi, Gs, Gt GO, etc.). - DNA mismatch .repajr P « "JStangP _ ^ fey ^ 

e retion system protein E (see < -^~^^^~^^^ r y C ^^^f^ e ^r^ATP-binc^ng srte is complete* drfferent from that 

position Gly is found instead of Ser or Thr. 
o Consensus pattern: [AG]-x(4)-G-K-[ST]- 

, 2) Mote. W.. Amons R. FEBS Lett. 1 ^ L«| Acad Scl U S.A. 83:907-911(1986). 

cV !2- U. <- OK. 0*9- MP. . Bioenerg. « * 

which catalyzes the degradation of argmine ^Jj^ 1 ^^, agmatine into putrescine and urea. - 
drolase), a proka-yotic enzyme (gene speB) that cataly :es the y y {gene hutQ) ^ hydrolyzes 

Formim!noglutamase(EC3^^ " ^ methanogenic arcnae bacter^ 

S0 N-formimino-glutamateintoglutam^ 

TTiese enzymes are ptote«srfabo^3^«o^d^ ^ ^ sjgnature ^ 

which are involved in the binding of the two J^^Jl JJp^^GSTA] [H binds manganese- 
Consensus pattern: V^^^^S™ and the H bind manganese]- 
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nasi 59 (asp) Eaka-yMic and viral aspa-tyl pro"*?" . a*e ate ^ „, plole0 ^ .nzymes 

gastricsin . - Vertebrate cnymobi v / Mamma lian renin (EC w 

cathepsins D (EC 34235) and E (EC &J2|2*> ases sucn as asperg.llopepsin A > (EC J 

T P T F S^iff mplicatedT^ 

GTA] [D is the active site residue) Note, mes k 
[4.E1 

[ Foltmann B. Essays Biochera "f^^lWO). 
2 j navies D R. Ann, Rev. ^ £ 30:4663-4671(1991). 

30 [02 5 6] ^^"^w ^i.l^^^« Ac-8c,U8A1g^W,WBW * B, ' 

The BTB (for BR-C ttk and bab) [1] or PO£ (jor 

BR ' cmk0,ziN 

Differ 1995;6:1495-1503. iqq 8 -l7-2473-24B4. 

^ — — * 
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-pilOfromPseudomonasae^^ 

coli. - hrpH from Pseudomonas <^£Z£*™^^ vesicatoria, which is also involved in the hype, 

sitivity response in plants. - hr P A1 f «^?^^3^^ in the secreti ° n ° f *" ' P3 inVaS ' nS * < 
sensitivity response. - mx.D from Sh gete f tamer vvh.ch ,s n noeae . . yssC from Yersinia entero- 

necessary for penetration of intestinal epithelial ™^ 9 q{ the Yop virulence proteins. - The gpIV 

colitica virulence plasmid pYV. which seems to b W^JZZ be involved in phage assembly and morpho- 

[OSAEWLI™l-P* IVMFYV *' (2) ' X(2HLV>F 

[0259] 63. (Bac globin) *** OBnk ^^ oxygen [1], Almost all globins belong to 

Globins are heme-containing proteins involved ,n taring and/or X anspon g xyg ^ ^ ^ ^ ^ ^ 
a ,arge family (see <PDOC0Q793 » the 

[2 ,3]: - Monomeric hemoglobins from the protozoan q » and U410 f rom tne chlorop.ast 

Lrmophiia. - Cyanoglobin from the cyanoba^ena ^ 

. M^onciseEncyclopediaBi— ^ 

[0260] 64. Band 7 protein family signature memb rane phosphoprotein of red blood 

,s Mammalian band 7 protein [1] ^^^"5^5^ ^ proleins of the junctional complex of the mem- 
cells thought to regulate cat.on conductance by interac ting w ° ? . Caenomabdrtis ele gans protein 
brane skeleton. Structural.y, band7 . ■ ^ ~ s Lry transduction channel. It may links 
mec-2 [21. Mec-2 positively regulates the activity of the putauve J™ receDtor neurons. -Caenorhabdrtis elegans 
Themechanosensory^ 
40 proteins sto-! to sto-4. - Caenorhabd.t^ 

bacterium tuberculosis hypothetical protein ^S^^^S all tnes e proteins consist of a short N-term,nal 
. Methanococcus jannaschii hypothetical ^Z^^SlTlZo 1 70 to 350residues) C-terminal domain . 
domain which is followed by a transmembrane, ^i^JE^L the transmembrane domain was selected 
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[ 1) Svensson B., Svendsen I.. Hoejrup P . Roepslorfl P-. Ludvtgsen S.^ ward E., Ryals J Mol. Plant 

two distinct inhibitory sites. 



, 1 1 + 

H + +_ 



-I 



-70 residues- 



C: conserved cysteine involved in a disulfide bond. 
•#': active site residue. 
'*•: position of the pattern. 




. Bet v I, the ma|o, pollen alk»aen Iron *» bm*. Bet 

North America and USSR. 
. Aln g I. the major pollen allergen from alder. 
. A pi G I. the major allergen from cele * 
. Sr b I, the major pollen allergen from hornbeam. 
. cor a I. the major pollen allergen from hazel 
. Mai d I, the major pollen allergen from apple. 

Asparagus wound-induced prote.n AoPR1 

Sbeanpathogene^ 
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Pea abscisic acid-response P^e.ns ABR17 a nd ABm 8 
Potato pathogenesis-related proteins STH-2 and STH-21. 
Soybean stress-induced protein SAM22. 



SoyDean s>ucoo-»>«- r - 

'B»)«2. Rosse „ D AominoRM. Plan.Mol. Biol. ,8:459.4660992). 

r02681 68 bZIP transcription factors basic domain signature er jns that contain a 

Kz.P lupertamiiy [1 ,2,] of eukaryotic M^^S* a Ripper required for dimerizatioa This 
basic region mediating sequence-specific DNA-bmdjng Wta^ here . Transcrl pt lon factor 

S is quite large, therefore only a pantal st ^^/^tl regions of SV40 and metallothionein IIA. AM , 
AM which binds selectively to enhancer elements in the c. > control g _ Jun . B apd J(jn . 

also known as c-jun, is the cellular homolog of the avian a corns , v rut 17 AS V , H roto ^ ncogene tnat Torms a 
to pS transcription factors which are high ^^£^&L*cM>mpan»^^ 
non'-covalentdimerwithc-jun. '^S^ 9 ^^^K^ and LRF-1 . - Maize Opaque 2, a t«n«*na 
binding proteins CREB, CREM, ATF-1 , ATF-3. ATF 4, ai r , durjng endosperm . . Arab dopsis 

uansonptional activator involved in the ^^^^^TobaL TAF-1 and wheat EMBP-1. All these 
G box binding factors GBF1 to GBF4, Parsley CPRF- to CPW- , Gjant| which represses the 

p r S bSne G-box promoter elements ^PjJ^ Tosop^Box B binding factor 2 (BBF-2), a 
expression o, both the kruppel and ^^^^^^ooM dehydrogenase and yolk l**""^ 
transcriptional activator that binds to a 2SS5*H*leh is involved in head morphogenesis. - Caenorhabdrtis 
Drosophila segmentation protein cap'n collar (gene cnc . wn. b|astomeres in the early embryo. - Yeast GCN4 

Pleaans skn-1 a developmental protein involved in the fate « vemra o sjonof amin o acid-synthesizing 

enzvmes in response to amino acid starvation and tne rwm k bo|jc enzyme s. - Yeast MET2B a 

V 3 which turns on the expression of -^S^SSpSa (or YAP1), a transcriptional activator of the 
transcriptional activator of ^^^^^^ trans-activator protein BZLF1 , 
genes for some oxygen detoxification enzyme^ tps^ AENQ] . x . R . x . [RK l- 



pyruvate carboxylase (EC 6.4.1 .1). 
Acetyl-CoA carboxylase (EC 6.4.1 .2). 
Propionyl-CoA carboxylase (EC 6.4.1 .3). 
Methylcrotonoyl-CoA carboxylase (EC 6.4.1 .4). 
Geranoyl-CoA carboxylase (EC 6.4.1.5). 
Urea carboxylase (EC 6.9.4.6) 
Oxaloacetate decarboxylase (EC 4.1.1.3). 
Methylmalonyl-CoA decarboxylase (EC 4.V1.41) 



Glutaconyl-CoA decarboxylase (EC 4,1 1.1 .70). (transcarboxylase) . Sequence data reveal that the region 
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[AlShenoyB-CXieT-ra 
complexes are: 

. Pyruvatedehydrogenase^ 
. ^oglutaratede^ 

. Branched-chain 2-oxo ac,d dehydrog ^ component en - 

[0272] E2 acyltransferases have an CD b)nd a sin g le |,poy I g ■ » groups [3] 

!ou a nd irTthe following proteins: iour jn compon ents, 

- ■ SS=S55S=S===»--— 

j 1] Yeaman S.J. Biochem. J. 257:625- 632 ^ 1 98 ^ 96 g86) 
[6]PriefertH.,HeinS.,KruegerN., 

■ T==~^.=^-~~ 

catalytic subunit. 
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. caenorhabditiselegans protein unc-1 3 whose 

and a C2-like domain at the C-t— . c . termi nal of a PH-doma,n. 

. Perforin (see <PDOC0u^3 1 >) 

contain a C2 domain. jn 
. yeast hypothetical protein YML072C has a ^ g 

^ H^oiniS^ 

h ., n Hensev c Eur. J. Biochem. 208.547-557(1992). 
3 Bros* N.. Holmann K O Ma V S „jdMT .1, )993) 

terminal region have been developed. 
. Consensuspattem-.IL.V^ 

. Consensus pattern: D-[L!VMFY] x I 67 . 18 0(1992). 

„ t , Ftald J Riqas M , Rodgers L, Wigler M.. Young D. MoL BnL Cel. 3.167 

;^r^^--- i07:1671 - 1678(1994y 

[02771 72. CAP_GLY (CAP-Gly ^ may be a member but has not been included. It has 

" S-sS^ ' 

Trends Biochem Sci 1993;18:82-83. 
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. .. . <oiri i 7 nnrrLlP-1701 a 160 Kd protein associated with intermediate 

for spindle pole body fusion during conjugation. 

CAP-Gly domain. 

- Caenorhabditis elegans hypothetical protein M01A8.2. 

- Yeast hypothetical protein YNL1 48c. 

of the domain and includes five of the six conserved glycines. 
. consensus pattern: Goc(8,10 H FYW]-x-GHU^^^ 

[ 1] Riehemann K., Sorg C. Trends Biochem. Sci. 18:82-83(1993). 
[0279] 73. (CBD1) 

rssssr. - ... «- r - - ;f ~ - - p> 

known to contain such a domain are: 

- Endoglucanase I (gene egll ) from Trichoderma reesei. 

i - Endoglucanase 1 1 (gene egl2) from Trichoderma reesei. 

: 

Trichoderma ree<ei, and Trichoderma vinde. 
. Exocellobiohydrolase II (gene CBHII) from Trichoderma reesei. 
o - Exocellobiohydrolase 3 (gene cel3) from Agaricus bisporus 

- Endoglucanases B, C2, F and K from Fusarium oxysporum. 

s in this type of CBD domain, all involved in disulfide bonds. 



xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 
**************************** 

'C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 

[0283] Such a domain has also been found in a putative polysaccharide binding protein from the red alga, Porphyra 
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Durourea F21 Structurally, this protein consists of four tandem repeats of the CBD domain. 

[0284] Consensus pattemC-G-G-x(4,7)-G-x(3)-C-x(5)-C-x(3,5)-[NHG]-x-[FYWM]- x(2)-Q-C [The four C's are in- 
volved in disulfide bonds) Sequences known to belong to this class detected by the pattern ALL. 

[ 1] Gilkes N.R., Henrissat B„ Kilburn D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 2] Liu Q., der Meer J. P., Reith M.E. 

[02851 74 CBS domain 3D Structure found as a subdomain in TIM barrel of inosine-. CBS domain web page. CBS 
domains are small intracellular modules mostly found in 2 or four copies within a protein. CBS domains are found in 
cystathionine-beta-synthase (CBS) where mutations lead to homocystinuria. Two CBS domains are found in inosine- 
monophosphate dehydrogenase from all species, however the CBS domains are not needed for activity Two CBS 
domains are found in intracellular loops of several chloride channels. Mutations in this domain of Swiss:P35520 lead 
to homocystinuria. Number of members: 414 

[1]Medline: 97172695 The structure of a domain common to archaebacteria and the homocystinuria disease pro- 
tein Bateman A; Trends Biochem Sci 1 997;22:12-1 3. 

[2]Medline- 96279836 Structure and mechanism of inosine monophosphate dehydrogenase in complex with the 
immunosuppressant mycophenolic-acid. Sintchak MD, Fleming MA, Futer O, Raybuck SA, Chambers SP, Caron 
PR, Murcko MA, Wilson KP; Cell 1996,85:921-930. 

[s'lMedN^g^^gg^CBS domains in CIC chloride channels implicated in myotonia and nephrolithiasis (kidney 
stones). Ponting CP; J Mol Med 1997;75:160-163. 

[0286] 75 CDP-OH_P_transf (CDP-alcohol phosphatidyltransferase) 

All of these members have the ability to catalyze the displacement of CMP from a CDP-alcohol by a second alcohol 
with formation of a phosphodiester bond and concomitant breaking of a phosphoride anhydride bond. Number of mem- 

^number of phosphatidyltransferases, which are all involved in phospholipid biosynthesis and that share the property 
of catalyzing the displacement of CMP from a CDP-alcohol by a second alcohol with formation of a phosphodiester 
bond and concomitant breaking of a phosphoride anhydride bond share a conserved sequence region [1,2]. These 
enzymes are: 

- Ethanolaminephosphotransferase (EC 2.7.8.1) from yeast (gene EPT1). 

- Diacylglycerolcholinephosphotransferase (EC 2.7.8.2) from yeast (gene CPT1). 

- Phosphatidylglycerophosphate synthase (EC 2.7.8.5) (CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyl- 
transferase) from bacteria (gene pgsA). 

- Phosphatidylserine synthase (EC 2.7.8.8) (CDP-diacylglycerol--serine O-phosphatidyftransferase) from yeast 
(gene CHOI ) and from Bacillus subtilis (gene pssA). 

- Phosphatidylinositol synthase (EC 2.7.8.11) (CDP-diacylglycerol-inositol 3-phosphatidyltransf erase) from yeast 
(gene PIS). 

These enzymes are proteins of from 200 to 400 amino acid residues. The conserved region contains three aspartic 
acid residues and is located in the N-terminal section of the sequences. 

- Consensus pattern: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D 

[1 [Medline- 97075020 Two-dimensional 1 H-NMR of transmembrane peptides from Escherichia coli phosphatidylglyc- 
erophosphate synthase in micelles. Morein S, Trouard TP, Hauksson JB, Rilfors L, Arvidson G, L.ndblom G; Eur J 
Biochem 1996;241:489-497. 

[ 1] Nikawa J.-l., Kodaki T, Yamashita S. 
J. Biol. Chem. 262:4876-4881(1987). 
[ 2] Hjelmstad R.H., Bell R.M. 
J. Biol. Chem. 266:5094-5134(1991). 

[0287] 76 CHOD (Cholesterol oxidase) Members of the GMC oxidoreductase family. Number of members: 3 
r02881 [1 lMedline 94032271 Crystal structure of cholesterol oxidase complexed with a steroid substrate: implica- 
tions for flavin adenine dinucleotide dependent alcohol oxidases. Li J, Vrielink A, Brick P, Blow DM; Biochemistry 1 993; 
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£S^^iS3V»> - ** RM *" ca * ed: m6,ha "°' * " w " " 

6ela i„. acataldenyd, ♦ reduced I «^>P« a Beaction ^alyzed: glucose * unknown accepto, 

. cnSSd^^ 

meLaseo. hyd^ncyanldetron, w«c.^*On.o.* W rW«* 
one is located in the central sect.on. 

• : U..UII ■"' 

, „ ca.ene, D R. J- M* Bid. ^ 81 !*i ( lf^ (1994) 

\i Henikon "SSSJ ^ B .To. B. Mo,. Microbiol amOXW 

" iroCr^Po-^'Spnvsio^^s^us,,™. 

. y c y [KR1 Y-x-[DE](2)-x-[FY]-E-Y-R-H-V-x-lLV]-[PT]-[KRPl 
. Consensus pattern: Y-S-x-[KR] Y Ml Jt H *) l 
. consensus pattern: H-x-P-E-x-H-llV]-L-L-F-[KR] 

ri D J Reed S.I., Tainer J.A. Science 262:387-395(1993). 

Number of members: 16. Casern kinase II (CK-2) [1] mm an W rous „ generally pn osphorylates Ser 

S found both in the cytoplasm ^.^^^^S^- CK-2 ^ as an 

or Thrat the N-terminalof stretch of ac>d,c residues (see<PDOO , ^ ^ ^ c fe|ated 

» o two catatytic subunits (alpha) and two W^g™* Q»» )■ J ^ plants express two torms of 

informs oflhe catalytic subunit: alpha and ct f^^ 

^^21^^ been used as a signature pattern. 

55 . ConS en SU spattem«^^ 

r n Allende J.E., Allende C.C. FASEB J. 9:31 3-323(1995). 
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[ 2] Reed J.C., Bidwai A.P., Glover C.V.C. J. Biol. Chem. 269:18192-18200(1994). 
[0293] 79. CLP_protease (Clp protease) 

These proteins belong to family S14 in the classification of peptidases. 

5 

- !- The Clp protease has an active site catalytic triad. In E. coli Clp protease, ser-111, his-136 and asp-185 form 
the catalytic triad. 

- !- Swiss: P48254 has lost all of these active site residues and is therefore inactive. 

- ! - Swiss:P42379 contains two large insertions, Swiss:P42380 contains one large insertion. Number of members: 38 

10 

[0294] The endopeptidase Clp (EC 3.4.21 .92) from Escherichia coli cleaves peptides in various proteins in a process 
that requires ATP hydrolysis [1 ,2]. Clp is a dimeric protein which consists of a proteolytic subunit (gene dpP) and either 
of two related ATP-binding regulatory subunits (genes clpA and dpX). CIpP is a serine protease which has a chymo- 
trypsin-like activity. Its catalytic activity seems to be provided by a charge relay system similar to that of the trypsin 
is family of serine proteases, but which evolved by independent convergent evolution. Proteases highly similar to CIpP 
have been found to be encoded in the genome of the chloroplast of plants and seem to be also present in other 
eukaryotes. The sequences around two of the residues involved in the catalytic triad (a serine and a histidine) are 
highly conserved and can be used as signature patterns specific to that category of proteases. 

20 - Consensus pattern: T-x(2)-[LIVMF]-G-x-A-[SAC]-S-[MSA]-[PAG]-[STA] [S is the active site residue] 

- Consensus pattern: R-x(3)-[EAP]-x(3)-[LIVMFYT]-lv1-[LIVIv1]-H-Q-P [H is the active site residue] 

[1]Medline: 98050920. The structure of CIpP at 2.3 angstroms resolution suggests a model for ATP-dependent 
proteolysis. Wang J, Hartling JA, Flanagan JM; Cell 1997;91:447-456. 
25 [ 1] Maurizi M.R., Clark W.P., Kim S.-H., Gottesman S. J. Biol. Chem. 265:12546-12552(1990). 

[ 2] Gottesman S„ Maurizi M.R. Microbiol. Rev. 56:592-621(1992). 
[ 3] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[0295] 80. CNG_membrane (Transmembrane region cyclic Nucleotide Gated Channel) 
30 [iJMedline: 94224763. Cyclic nucleotide-gated channels: an expanding new family of ion channels. Yau KW; Proc Natl 
Acad Sci USA 1994;91:3481-3483. 

This family is found to the N-terminus of the cNMP_binding. Number of members: 56. Proteins that bind cyclic nucle- 
otides (cAMP or cGMP) share a structural domain of about 120 residues [1-3]. The best studied of these proteins is 
the prokaryotic catabolite gene activator (also known as the cAMP receptor protein) (gene crp) where such a domain 
35 is known to be composed of three alpha-helices and a distinctive eight-stranded, antiparallel beta-barrel structure. 
Such a domain is known to exist in the following proteins: 

Prokaryotic catabolite gene activator protein (CAP). 

- cAMP- and cGMP-dependent protein kinases (cAPK and cGPK). Both types of kinases contains two tandem copies 
40 of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits: a catalytic chain and 

a regulatory chain which contains both copies of the domain. The cGPK's are single chain enzymes that include 
the two copies of the domain in their N-terminal section. The nucleotide specificity of cAPK and cGPK is due to 
an amino acid in the conserved region of beta-barrel 7: a threonine that is invariant in cGPK is an alanine in most 
cAPK. 

45 - Vertebrate cyclic nucleotide-gated ion-channels. Two such cations channels have been fully characterized. One 
is found in rod cells where it plays a role in visual signal transduction. It specifically binds to cGMP leading to an 
opening of the channel and thereby causing a depolarization of rod photoreceptors. In olfactory epithelium a similar, 
cAMP-binding, channel plays a role in odorant signal transduction. There are six invariant amino acids in this 
domain, three of which are glycine residues that are thought to be essential for maintenance of the structural 

so integrity of the beta-barrel. Two signature patterns have been developed for this domain. The first pattern is located 

within beta-barrels and 3 and contains the first two conserved Gly. The second pattern is located within beta- 
barrels 6 and 7 and contains the third conserved Gly as well as the three other invariant residues. 

- Consensus pattern: [LIVM]-[VIC]-x(2)-G-[DENQTA]-x-[GAC]-x(2)-[LIVMFY](4)-x(2)-G 

55 - Consensus pattern: [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV] 

[ 1] Weber I.T., Shabb J.B., Corbin J.D. Biochemistry 28:6122-6127(1989). 
[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 
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[ 3] Shabb J.B.. Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 



[0296 ] 81 . COX10,ctaB_cyoE (Cytochrome c oxidase assembly factor) 
[1] 

MogiT.SaikiK.AnrakuY; 

P^elthaU«ae,aschap^^ 

. consensu pattern: l ED|.x-D-x(2VM-x-B-T-x(2)-B.X(4>-G 
[ 11 Nobrega MP., Nobrega F.G., Tzagolofl A. 

J. Biol. Chem. 267:24273-24278(1992). 
[31Chepuri Y.GennisR.B. 
1 J Biol Chem. 265.12978-12986(1990). 

[02991 82. COX3 (Cytochrome c oxidase subunit III) 
,o This family corresponds to chains c and p. 
[11 

Shinzawa-ltoh K, Nakashima R, Yaono R, Yoshikawa S, 
Science 1996;272:1136-1144. 

^T^SxS.c^acoxibasa.ubun.Vb, 

40 [1] 

S; 

45 Science 1996;272:1136-1144. 

This family consists of chains F and S 

Number of members. 10 . „ an oligome ric enzymatic complex which is a component £ he 

[03011 Cytochrome c ox.dase (EC 1 .9.3. i ) V ■ * n , alectrons , r0 m cytochrome c to oxygen. In eukaryotes this 
espiratory chain complex and is involved in the transfer ^^^Mc prokaryotes it is found in the plasma 
so erSme complex is located in the f " hon '^ the enzyme complex there are, .n 

membrane. In addition to the three large subunrts that form the caw ^ ^ ^ vb ^ a|8> 
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[ 1] Capaldi R A , Malatesta R. Darley-Usmar V.M. Biochim. Biophys. Acta 726:135-148(1983). 

[ 2] Rizzuto R., Sandona D., Brini M., Capaldi R.A., Bisson R. Biochim. Biophys. Acta 1129:100-104(1991). 

[0302] 84. COesterase (Carboxylesterases) 
5 Cholinesterase pages 

The prints entry is specific to acetylcholinesterase 
Number of members: 273 

[0303] Higher eukaryotes have many distinct esterases. Among the different types are those which act on carboxylic 
esters (EC 3 1 1 -) Carboxyl-esterases have been classified into three categories (A, B and C) on the basis of differ- 
w ential patterns of inhibition by organophosphates. The sequence of a number of type-B carboxylesterases indicates 
[1 ,2,3] that the majority are evolutionary related. This family currently consists of the following proteins: 

- Acetylcholinesterase (EC 3.1.1.7) (AChE) [E1] from vertebrates and from Drosophila. 

- Mammalian cholinesterase II (butyryl cholinesterase) (EC 3.1 . 1 .8). Acetylcholinesterase and cholinesterase II are 
is closely related enzymes that hydrolyze choline esters [4]. 

- Mammalian liver microsomal carboxylesterases (EC 3.1.1.1). 

- Drosophila esterase 6, produced in the anterior ejaculatory duct of the male insect reproductive system where it 
plays an important role in its reproductive biology. 

Drosophila esterase R 
20 - Culex pipiens (mosquito) esterases B1 and B2. 

- Myzus persicae (peach-potato aphid) esterases E4 and FE4. 

■ Mammalian bile-salt-activated lipase (BAL) [5], a multifunctional lipase which catalyzes fat and vitamin absorption. 
It is activated by bile salts in infant intestine where it helps to digest milk fats. 

- Insect juvenile hormone esterase (JH esterase) (EC 3.1 .1 .59). 

25 - Lipases (EC 3.1.1.3) from the fungi Geotrichum candidum and Candida rugosa. 

- Caenorhabditis gut esterase (gene ges-1 ). 

- Duck fatty acyl-CoA hydrolase, medium chain (EC 3.1 .2.14), an enzyme that may be associated with peroxisome 
proliferation and may play a role in the production of 3-hydroxy fatty acid diester pheromones. 

- Membrane enclosed crystal proteins from slime mold. These proteins are, most probably esterases; the vesicles 
30 where they are found have therefore been termed esterosomes. 

[0304] So far two bacterial proteins have been found to belong to this family: 

- Phenmedipham hydrolase (phenylcarbamate hydrolase), an Arthrobacteroxidans plasmid-encoded enzyme (gene 
35 pcd) that degrades the phenylcarbamate herbicides phenmedipham and desmedipham by hydrolyzmg their central 

carbamate linkages. 

- Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

[0305] The following proteins, while having lost their catalytic activity, contain a domain evolutionary related to that 
40 of carboxylesterases type-B: 

- Thyroglobulin (TG), a glycoprotein specific to the thyroid gland, which is the precursor of the iodinated thyroid 
hormones thyroxine (T4) and triiodo thyronine (T3). 

- Drosophila protein neuractin (gene nrt) which may mediate or modulate cell adhesion between embryonic cells 
45 during development. 

- Drosophila protein glutactin (gene git), whose function is not known. 

[030B] As is the case for lipases and serine proteases, the catalytic apparatus of esterases involves three residues 
(catalytic triad) a serine, a glutamate or aspartate and a histidine. The sequence around the active site serine is well 
so conserved and can be used as a signature pattern. A conserved region located in the N-terminal section containing a 
cysteine involved in a disulfide bond has been selected as a second signature pattern. 

- Consensus pattern: F-[GR]-G-x(4)-[LIVM]-x-[LIV]-x-G-x-S-[STAG]-G[S is the active site residue] 

- Consensus pattern: [ED]-D-C-L-[YT]-[LIV]-[DNS]-[LIV]-[LIVFYWl-x-[PQR] [C is involved in a disulfide bond] 

55 

1 1] Myers M., Richmond R.C., Oakeshott J.G. Mol. Biol. Evol. 5:113-119(1988). 

[ 21 Krejci E Duval N., Chatonnet A, Vincens P., Massoulie J. Proc. Natl. Acad. Sci. U.S.A. 88:6647-6651 (1 991). 
[ 3] Cygler M , Schrag J.D., Sussman J.L., Harel M., Silman I. Gentry M.K., Doctor B.P. Protein Sci. 2:366-382 
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(1993). 

[ 4] Lockridge O. BioEssays 9:125-128(1988). 

[ 5] Wang C.-S., Hartsuck J.A. Biochim. Biophys. Acta 1166:1-19(1993). 
[0307] 85. CPSase_L_chain (Carbamoyl-phosphate synthase (CPSase)) 
[1] 

Medline: 94347758 

Three-dimensional structure of the biotin carboxylase subunit. of acetyl-CoA carboxylase. 
Waldrop GL, Rayment I, Holden HM; 

Biochemistry 1994;33:10249-10256. 

[1] 

MammalircTroamyl phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 
Simmer JP, Kelly RE, Rinker AG Jr, Scully JL, Evans DR; 

cS^^J^^l^^V^ the ATP-dependent synthesis of carbarnyl-phosphate from 9'^ne ^ 
ammonia and bicarbonate. This important enzyme inrtiates both the urea cycle and the biosynthesis of arginine and/ 

0 Th y eTa?bamoy 2 | ] phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The 
smaTchain promotes the hydrotysisof glutamine to ammonia, which is used by the large cha,n to synthes,ze carbamoyl 
phosphate. See CPSase_sm_chain. 

The small chain has a GATase domain in the carboxyl terminus. 
See GATase. 

10308]" lTm b oyf-p 1 h 8 oIphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamy.-phosphate 
Lorn glutamine (EC 6 3.5.5) or ammonia (EC 6.3.4.16) and bicarbonate [1]. This important enzyme initiates both the 
urea cycle and the biosynthesis of arginine and pyrimidines. 

[0309] Glutamine-dependent CPSase (CPSase II) is invoked in the biosynthesis of pynmidmes and purines. In bac- 
teria such as Escherichia coli, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
» enzymes The bacterial enzymes are formed of two subunits. A small chain (gene carA) that provides 



glutamine amidotransf erase activity (GATase) necessary for removal of the ammonia group from glutamine, and a 
?arge chain (gene carB) that provides CPSase actrvity. Such a structure is also present ,n f ung, for arginine biosynthesis 
genes CPA1 and CPA2). In most eukaryotes, the first three steps of pyrimidine biosynthesis are cat ahyzec by j. > large 
multifunctional enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD ,n <^ 
domain is located between an N-terminal GATase domain and the C-terminal part wh.ch encompass the dihydroorotase 
and aspartate transcarbamylase activities. 

[0310] Ammonia-dependent CPSase (CPSase I) is involved in the urea cycle in ureolyt.c vertebrates, it is a mono- 
functional protein located in the mitochondrial matrix. 

[031 1 ] The CPSase domain is typically 1 20 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 amino acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

[03121 The CPSase subdomain is also present in a single copy in the biotin<Jependent enzymes ace yl-CoA car- 
boxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase (EC 6.4.1.3) (PCCase), pyruvate carboxylase (EC 6.4.1.1) 

^^J^^^eg^^afe probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

- Consensus pattern: [FYV]-[PS]-[LIVMC]-[LIVMA]-[LIVM]-[KR]-[PSA]-[STA]-x(3)-[SG]-G-x-[AG] 

- Consensus pattern: [LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVIv1]-[KR]-[LIVMSTAC] 

[ 1] Simmer J. P., Kelly R.E., Rinker A.G. Jr., Scully J L, Evans D.R. 

J. Biol. Chem. 265:10395-10402(1990). 
[ 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. 

BioEssays 15:157-164(1993). 

[031 4] 86. CPSase_sm_chain (Carbamoyl-phosphate synthase small chain, CPSase domain) 
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".res 

lected as signatures for the subdoma.n. 

[0321] 87. CARL_TRIO (CRAL/TRIO domain) 
[1] 

Medline: 98121119 s cerevisiae ph0 sphatidylinosito.-transfer prote.n. 

so Crystal structure of the Saccnaromycw 

Sha B, Phillips SE, Bankaitis VA, Luo M. 

==E======== : == :: -- 

[1] 

Medline: 94255482 
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Crystal structure of CspA, the major cold shock protein of Escherichia coli. 
Schindelin H, Jiang W, Inouye M, Heinemann U; 
Proc Natl Acad Sci USA 1994;91:5119-5123. 

SST a rse'ddomain of about 70 amino acids has been found in proka^otic and D^*9 
pSs^aE^This domain, which is known as the ^.d-shockdomain^CSD) is present, the prote.nshstedbe.ow. 

- Escherichia coli protein CS7.4 (gene cspA) which is induced in response to low tempe rature '<^^^m 
an5 which binds to and stimulates the transcription of the CCAAT-conta.mng promoters of the HN-S protein and 

- KmalianYbox 

. lTopus e Y ne box binding proteins -1 and -2 (Y1 and Y2). Proteins that bind to the CCAAT-containing Y box of 

. SpTsbS^ 

- Enhancer factor I subunit A (EFI-A) (dbpB). A protein that also bind to CCAAT-motif in various gene promoters. 

- DbpA, a Human DNA-binding protein of unknown specificity. 

- Bacillus subtilis cold-shock proteins cspB and cspC. 

- Streptomyces clavuligerus protein SC 7.0. 

- rr— ^:s:Sst^ 

" csD domaT The function of Unr is not yet known but 1 cou.d be a muKiva.ent DNA-bmdmg protein. 

[03241 AsasignaturepatternfortheCSDdomainJtsmos^ 
has been selected. It must be noted that the 

beginning of this region is highly similar [4] to the RNP-1 RNA-bmd.ng motif. 

- Consensus pattern: [FY]-G-F-l-x(6,7)-[DER]-[LIVM]-F-x-H-x-[STKR]-x-[LIVMFY] 

[ 1] Doniger J., Landsman D., Gonda M.A., Wistow G. 

New Biol. 4:389-395(1992). 
[ 2] Wistow G. 

Nature 344:823-824(1990). 
[3] Jones P.G., Inouye M. 

MoL Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nucleic Acids Res. 20:2861-2864(1992). 

[0325] 89. CTF_NFI (CTF/NF-I family) 

NuTar Scior I (NF-I) or CCAAT box-binding transcription factor (CTF) [1 ,2] (also known as TGGCA-binding 

T^Zu^U^ of activating transcription and DNA replication. The CTF/NF-I tarn* name has also been 
NF-I genes have been classified as: 

- The CTF-like factors subfamily (prototype form: CTF-1 ) [4] 

- The NFI-X proteins. 

- The NFI-A proteins. 

- The NFI-B proteins. 
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[0329] So far, all CTF/NF-I family members appear to have similar transcription and replication activities. 
[0330] CTF/NF-1 proteins contains 400 to 600 amino acids. The N-terminal 200 amino-acid sequence, almost per- 
fectly conserved in all species and genes sequenced, mediates site-specific DNA recognition, protein dimerization and 
Adenovirus DNA replication. The C-terminal 100 amino acids contain the transcriptional activation domain. This acti- 
s vation domain is thetarget of gene expression regulatory pathways ellicited by growth factors and it interacts with basal 
transcription factors and with histone H3 [6], 

[0331] A perfectly conserved, highly charged 1 2 residue peptide located in the N-terminal part of CTF/NF-I has been 
selected as a specific signature for this family of proteins. 

io - Consensus pattern: R-K-R-K-Y-F-K-K-H-E-K-R 

[ 1] Mermod N., O'Neill E.A., Kelly T.J., Tjian R. 

Cell 58:741-753(1989). 
[ 2] Rupp R.A.W., Kruse U., Multhaup G., Goebel U., Beyreuther K., 
J5 Sippel A.E. 

Nucleic Acids Res. 18:2607-2616(1990). 
[ 3] Nagata K., Guggenheimer R.A., Enomoto T, Lichy J.H., Hurwitz J. 

Proc. Natl. Acad. Sci. U.S.A. 79:6438-6442(1982). 
[ 4] Santoro C, Mermod N., Andrews PC, Tjian R. 
20 Nature 334:21 1 8-2224(1 988). 

[ 5] Gil G., Smith J.R., Goldstein J.L, Slaughter C.A., Orth K., Brown M.S., Osborne T.F. 

Proc. Natl. Acad. Sci. U.S. A 85:8963-8967(1988). 
[ 6] Alevizopoulos A., Dusserre Y, Tsai-Pflugfelder M., von der Weid T, Wahli W., Mermod N. 

Genes Dev. 9:3051-3066(1995). 

25 

[0332] 90. Calsequestrin (Calsequestrin) 
Number of members: 13 

[0333] Calsequestrin is a moderate-affinity, high-capacity calcium-binding protein of cardiac and skeletal muscle [1 ], 
where it is located in the lumenal space of the sarcoplasmic reticulum terminal cisternae. Calsequestrin acts as a 
30 calcium buffer and plays an important role in the muscle excitation-contraction coupling. It is a highly acidic protein of 
about 400 amino acid residues that binds more than 40 moles of calcium per mole of protein. There are at least two 
different forms of calsequestrin: one which is expressed in cardiac muscles and another in skeletal muscles. Both 
forms have highly similar sequences. 

[0334] Two signature sequences have been developed. The first corresponds to the N-terminus of the mature protein, 
35 the second is located just in front of the C-terminus of the protein which is composed of a highly acidic tail of variable 
length. 

- Consensus pattern: [EQ]-[DE]-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V 

- Consensus pattern: [DE]-L-E-D-W-[LIVM]-E-D-V-L-x-G-x-[LIVM]-N-T-E-D-D-D 

40 

[033S] [ 1] Treves S., Vilsen B., Chiozzi P., Andersen J.P., Zorzato F. 

Biochem. J. 283:767-772(1992). 
[0336] 91 . Carboxyljrans (Carboxyl transferase domain) 

[1] 

45 Medline: 93374821 

Primary structure of the monomer of the 12S subunit of transcarboxylase as deduced from DNA and characterization 
of the product expressed in Escherichia coli. 
Thornton CG, Kumar GK, Haase FC, Phillips NF, Woo SB, Park VM, 
Magner WJ, Shenoy BC, Wood HG, Samols D; J Bacteriol 1993;175:5301-5308. 
so [2] 

Medline: 93358891 
Molecular evolution of biotin-dependent carboxylases. 
Toh H, Kondo H, Tanabe T; 
Eur J Biochem 1993;215:687-696. 
55 All of the members in this family are biotin dependent carboxylases. 

The carboxyl transferase domain carries out the following reaction; transcarboxylation from biotin to an acceptor mol- 
ecule. There are two recognised types of carboxyl transferase. One of them uses acyl-CoA and the other uses 2-oxo 
acid as the acceptor molecule of carbon dioxide. 



64 



EP 1 033 405 A2 

Al. of the members in this family utiiise acy.^oA as the acceptor mo,ecu.e. 

Number of members: 146 synthases (STS) (formerly known as resveratrol syn- 

[0338] Chalcone synthases (CHS; j (EC* ^l^^^^'L^ and STS a key enzyme 
hases) are related plant enzymes [H tne addition of three molecules of malonyl-CoA to a 

active site residue] 

[ 11 Schroeder J., Schroeder G. 

Z. Naturforsch. 450:1-8(1990). , qrhroe derG 
[ 21 Lanz T„ Tropf S.. Marner F,J., Schroeder J., Schroeder G. 

J. Biol. Chem. 266:9971-9976(1991). 

[0341] 93. Chorismate_synt (Chorismate synthase) 
5 Number of members. 19 _ folU7fi * the last of the seven steps in the shikimate pathway which is 



in basic 



« o p Q H rn C l-x(21-[LIVMl-[GTV]-x-[LIVM](2)-[DE]-G-x-[PV] 

r 1] Schaller A., Schmid J., Leibinger U., Amrhein N. 

J Biol. Chem. 266:21434-21438(1991). 
f 21 Jones D.G.L., Reusser U„ Braus G.H. 

Mol. Microbiol. 5:2143-2152(1991). 

[0344] 94 Clat_adaptor_s(Clathrin adaptor complex small chain) 

n addition to clathhn, the CCV are composed d a numb^ 
areknownasadaptororclathrinassembrypro- 

with the cytoplasmic tails of membrane prote ns J^9 to JJ , he Go|gi complex and AP-2 which » associated w,th 
adaptor complexes are known: AP-1 wh,ch ,s assoc^ted * 8 £ Qf ^ , arge chains . the adapt.ns - 

K^^-AP-tandAP^ 

AP19 have also been found in yeast ^^^JXte thai reversibly associates with Golg. mem- 
to the trans Golgi network. 
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[0347] A conserved region in the central section of these proteins has been selected as a signature pattern. 
- Consensus pattern: [LIVM](2)-Y-[KR]-x(4)-L-Y-F 

[ 1] Pearse B.M., Robinson M.S. 

Annu Rev. Cell Biol. 6:151-171(1990). 
[ 2] Kirchhausen T., Davis AC, Frucht S„ O'Brine Greco B., Payne G.S., 

Tubb B. 

J. Biol. Chem. 266:11153-11157(1991). 
[ 3] Nakai M. , Takada T. , Endo T. 

Biochim. Biophys. Acta 11 74:282-284(1 993). 
[ 4] Phan H.L., Finlay J.A., Chu D.S., Tan P.K., K.rchhausen T, Payne G.S. 

EMBO J. 13:1706-1717(1994). 
[ 5] Kuge O., Hara-Kuge S., Orci L, Ravazzola M.. Amherdt M., Tanigawa G„ 

Wieland F.T., Rothman J.E. 

J. Cell Biol. 123:1727-1734(1993). 

[0348] 95. Clathrin_lg_ch (Clathrin light chain.) 

m hinhar eukarvotes two qenes code for distinct but related light chains: LC(a) and LC(b). Each of the two genes 
karyotes. 

: charged region located in the C-terminal extremity of all known clathrin light chains. 

- Consensus pattern: F-L-A-Q-Q-E-S 

[ 1] Keen J.H. 
3 Annu. Rev. Biochem. 59:415-438(1990). 

[ 2] Brodsky F.M. 

Science 242:1396-1402(1988). 
[ 3] Brodsky F.M., Hill B.L, Acton S.L., Naethke I., Wong D.H., 
Ponnambalam S., Parham P. 
5 Trends Biochem. Sci. 16:208-213(1991). 

Number of members: 79 
;o [1] 

Medline: 92191269 

Folding and trimerization of clathrin subunits at the tr.skelion hub. 
Nathke IS, Heuser J, Lupas A, Stock J, Turck CW, Brodsky FM; 
Cell 1992;68:899-910. [2] 
ss Medline: 88097376 ,„♦..„, 
Clathrin heavy chain: molecular cloning and complete primary structure. 
Kirchhausen T, Harrison SC, Chow EP, Mattaliano RJ, 
Ramachandran KL, Smart J, Brosius J; 
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Proc Natl Acad Sci US A 1987;84:8805-8809. 
[0353] 97. Collagen (Collagen triple helix repeat (20 copies)) 
[1] Medline: 94059583 
New members of the collagen superfamily 
Mayne R, Brewton RG; 
Curr Opin Cell Biol 1993;5:883-890. 
Scurvy is associated with collagens. 

The alignment contains 20 copies of the G-X-Y repeat that ^aujienai h proline . collagens are post 

helical structure. 

Number of members: 2125 

[0354] 98 Coprogen_oxidas (Coproporphyrinogen III oxidase) 

such as heme, chlorophyll or cobalamm. seems tQ be imporlant 

[0355] Coproporphyrinogen III oxidase ,s an enzyme M ^^SdtSSlte sources show that this enzyme 

. Consensus pattern: K-x-W-C-xt^-pY^OHUVMl-x-H-R-x-E-x-R-G^LIVMJ-G-G-lL.VMl-F-F-D 

[1]Xu K., EllinttT. 

J Bacteriol. 175:4990-4999(1993). 
[ 2] Kohno H„ Furukawa T., Yoshinaga T., Tokunaga R., Taketan. S. 

J Biol. Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M., Chambon H., Jolles J., Labbe P. 

Eur. J. Biochem. 156:579-587(1986). 
[4]Xu K., Elliott T. 

J. Bacteriol. 176:3196-3203(1994). 

[0356] 99. Corona.nucleoca (Coronavirus nucleocapsid protein) 
[1] 

Medline: 98087828 
Identification of a specific interaction between the 
coronavirus mouse hepatitis virus A59 nucleocapsid protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
Virology 1997;239:78-86. 
Number of members: 44 
[0357] 100. Cu-oxidase (Multicopper oxidase) 

[1] 

Medline: 90126844 

The blue oxidases, ascorbate oxidase, laccase and ceruloplasmm. 
Modelling and structural relationships. 
Messerschmidt A, Huber R; 
Eur J Biochem 1990;187:341-352. 
Number of members: 1 50 «« M . e , hr(sp cnectroscoDicallv different copper centers. These 

this family are: 
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Laccase (EC 1 .10.3.2) (urishiol oxidase), an enzyme found in fungi and plants, which oxidizes many different types 
of phenols and diamines. 

Ascorbate oxidase (EC 1 .10.3.3), a higher plant enzyme. 

- Ceruloplasmin (EC 1.16.3.1) (ferroxidase), a protein found in the serum of mammals and birds, which oxidizes a 
s great variety of inorganic and organic substances. Structurally ceruloplasmin exhibits internal sequence homology, 

and seem to have evolved from the triplication of a copper-binding domain similar to that found in laccase and 
ascorbate oxidase. 

[0359] In addition to the above enzymes there are a number of proteins which, on the basis of sequence similarities, 
10 can be said to belong to this family. These proteins are: 

- Copper resistance protein A (copA) from a plasmid in Pseudomonas syringae. This protein seems to be involved 
in the resistance of the microbial host to copper. 

- Blood coagulation factor V (Fa V). 

is . Blood coagulation factor VIII (Fa VIII) [E1]. 

- Yeast FET3 [3], which is required for ferrous iron uptake. 

- Yeast hypothetical protein YFL041 w and SpAC1 F7.08, the fission yeast homolog. 

[0360] Factors V and VI II act as cofactors in blood coagulation and are structurally similar [4]. Their sequence consists 
20 of a triplicated A domain, a B domain and a duplicated C domain; in the following order: A-A-B-A-C-C. The A-type 
domain is related to the multicopper oxidases. 

[0361] Two signature patterns have been developed for these proteins. Both patterns are derived from the same 
region, which in ascorbate oxidase, laccase, in the third domain of ceruloplasmin, and in copA, contains five residues 
that are known to be involved in the binding of copper centers. The first pattern does not make any assumption on the 
25 presence of copper-binding residues and thus can detect domains that have lost the ability to bind copper (such as 
those in Fa V and Fa VIII), while the second pattern is specific to copper-binding domains. 

- Consensus pattern: G-x-[FYW]-x-[UVMFYW]-x-[CST]-x(8)-G-[LM]-x(3)-[LIVMFYW] 

- Consensus pattern: H-C-H-x(3)-H-x(3)-[AG]-(LM] [The first two H's are copper type 3 binding residues] [The C, 
30 the 3rd H, and L or M are copper type 1 ligands] 

[0362] 101 . Cullin (Cullin family) 
Number of members: 24 

[0363] The following proteins are collectively termed cullins [1]: 

35 

- Caenorhabditis elegans cul-1 (or lin-1 9), a protein required for developmentally programmed transitions from the 
G1 phase of the cell cycle to the GO phase or the apoptotic pathway. 

- Caenorhabditis elegans cul-2, cul-3, cul-4 (F45E12.3), cul-5 (ZK856.1 ) and cul-6 (K08E7.7). 

- Mammalian CUL1 , CUL2, CUL3, CUL4A and CUL4B. 

40 - Mammalian vasopressin-activated calcium-mobilizing receptor (VACM-1), a kidney-specific protein thought to form 
a cell surface receptor [2] but which does not have any structural hallmarks of a receptor. 
Drosophila Iin19. 

- Yeast CDC53 [3], which acts in concert with CDC4 and UBC3 (CDC34) to control the Gl-to-S phase transition. 

- Yeast hypothetical protein YGR003w. 

45 - Fission yeast hypothetical protein SpAC24H6.03. 

[0364] The cullins are hydrophilic proteins of 740 to 81 5 amino acids. The C-terminal extremity is the most conserved 
part of these proteins. A signature pattern has been developed from that region. 

so - Consensus pattern: [LIV]-K-x(2)-[LIV]-x(2)-L-l-[DEQ]-[KRHNQ]-x-Y-[LIVM]-x-R-x(6,7)-[FY]-x-Y-x-[SA]> 

[ 1] Kipreos E.T., Lander L.E., Wing J.P., He W.W., Hedgecock E.M. 

Cell 85:829-839(1996). 
[ 2] Burnatowska-Hledin M.A., Spielman W.S., Smith W.L, Shi P., Meyer J.M., 
55 Dewitt D.L. 

Am. J. Physiol. 268:f1198-F1 210(1 995). 
[ 3] Mathias N., Johnson S.L, Winey M., Adams A.E., Goetsch L, Pringle J.R., 

Byers B., Goebl M.G. 
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Mol. Cell. Biol. 16:6634-6643(1996). 
[0365] 102. (Cu_amine_oxid) 

Copper amine oxidase signatures rt¥iHfl ,i 0n of a wide ranqe of biogenic amines including many 

1 .4.3.4) and copper-containing (EC 1 .4 3.6). homodimeric enzyme that binds 

sssr ^^x-^-^-^r r OT ;TS«z"S 9toriSM,0,he 

long to this class detected by the pattern ALL, except for lentil AO. 

[ 1] Knowles P.P., Doo.ey D.M. (In) Meta. ions in bidogica. systems; Sige. K. Sigel A., Eds.. 30:361- 403, Marce, 

M.J., Knowles P.F. Structure 3:1171-1184(1995). 
[0370] 103. Cys-protease (Cysteine protease) 

Number of members: 358 proteolytic enzymes which contain an active site 

[0371] Eukaryotic thiol proteases (EC 3.4.22.-) p J are aramny o p » 7 . histjdine side chain; an 

a, 8 only provided lo, .ocontr, d.ton™n e d sequences). 
. VeneO-e.e t^^M B (EC 3.4.22.1). H (EC 3.4.22 ,6), L (EC 3^<* «■ = 34.22.27, [2, 

- Plant enzymes: barley aleurain (EC 3.4.22.16) EP B1/B4, money « . { (EC 3 4 2 2.30), and pro- 
(EC 34 22 14)- papaya latex papain (EC 3,4,22.2), chymopapain (EC 3.4.22 6), car icain <tL a* k 

and RD21A. 

- House^iust mites allergens DerP1 and EurM1 . gnd 

• sssrssss s=d^»r^i.^.^-- 

AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

- Slime mold cysteine proteinases CP1 and CP2. 

- cruzipain from Trypanosoma cruzi and brucei. 

. Trophozoite cysteine proteinase (TCP) from various 

. Proteases from Leishmania mexicana, Theileria annulata and The.lena parva. 

: ESXXSE 

- Yeast thiol protease BLH1/YCP1/LAP3. 

. caenorhabditis elegans hypothetical protein C06G4.2, a calpa.n-hke proton. 
[0372] Two bacterial peptidases are also part of this family: 

- Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 

- Thiol protease tpr from Porphyromonas gingivalis. 
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<PD OC00382>). A) the major bloo d stage antigen. This protein of 111 Kd pos- 

terns. 

[LIVMF] [N is the active site residue] 

[ 1] Dufour E. 

Biochimie 70:1335-1342(1988). 
f 2] Kirschke H., Barrett A.J., Rawlings N.D. 

l ApprEnvi,on.*,<*iol.59:330- 3 33(1M3). 
t 6] Higgins D.S.. McCorm* D.J, Sharp P.M. 
Nature 34O;604-604(1989). 

mWLWB— mPLP*pand.n,. ra »r„a. 
j Mol Biol 1996;262:202-224. 

Cystathionine gamma-lyase, 
Cystathionine gamma-synthase, 
Cystathionine beta-lyase, 
so Methionine gamma-lyase, 

OAH/OAS sulthydrylase, 
O-succinylhomoserine sulphhydrylase 
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- Cystathionine gamma-lyase (EC 4.4.1.1) (gamma-cystathionase), which catalyzes the transformation of cystath- 
ionine into cysteine, oxobutanoate and ammonia. This is the final reaction in the transulfurat.on pathway that leads 
from methionine to cysteine in eukaryotes. ... ^ ^^^r 

. Cystathionine gamma-synthase (EC 4.2.99.9), which catahyzes the conversion of cysteine and succiny ^""ur- 
ine into cystathionine and succinate: the first step in the biosynthesis of methionine f rom cysteine in bacteria (gene 

- CysSthionine beta-lyase (EC 4.4.1 .8) (beta-cystathionase), which catalyzes the conversion of cystathionine into 
homocysteine, pyruvate and ammonia: the second step in the biosynthesis of methionine from cysteine in bacteria 

- Methionine gamma-lyase (EC 4.4. 1.11) (L-methioninase) which catalyzes the transformation of methionine into 
methanethiol, oxobutanoate and ammonia. 

- OAH/OAS sulfhydrylase, which catalyzes the conversion of acetylhomosenne into homocysteine and that of ace- 
tylserine into cysteine (gene MET17 or MET25 in yeast). 

- O-succinylhomoserine sulfhydrylase (EC 4.2.99.-). 

- Yeast hypothetical protein YGL1 84c. 

- Yeast hypothetical protein YHR112c. 

r0377] These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P group is attached to a lysine 
residue located in the central section of these enzymes; the sequence around this residue is highly conserved and can 
be used as a signature pattern to detect this class of enzymes. 

- Consensus pattern: [D Q]-[LIVMF]-x(3)-[STAGC]-[STAGCI]-T-K-[FYWQ]-[LIVMF]-x-G-lHQ]-[SGNH] [K is the pyri- 
doxal-P attachment site] 

[ 1] Ono B.I., Tanaka K., Naito K., Heike C, Shinoda S„ Yamamoto S., Ohmori S., OshimaT, Toh-E A. J. Bacteriol. 

[ 1 q SrSfJS'Sck D.B., Clark M.W.. Keng T, Ouellette B.F.F., Storms R.K., Zeng B, Zhong W.W., Fortin 
N., Delaney S., Bussey H. Yeast 9:363-369(1993). 

[0378] 105. Cyt_reductase 
FAD/NAD-binding Cytochrome reductase 
Number of members: 60 

FAD-containing fragment of corn nitrate reductase at 2.5 A resolution: relationship to other 

flavoprotein reductases. 
Lu G, Campbell WH, Schneider G, Lindqvist Y; 
Structure 1994;2:809-821. 

Se^quenc^oftquash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidore- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM, Campbell WH; 

J Biol Chem 1991;266:23542-23547. 
[0379] 106. Cytidylyltrans 
Phosphatidate cytidylyltransferase 

ST V!S^cf»W^^ (EC 2.7.7.41) [1,2,3] (also known as CDPOiacylglycero. synthase) (CDS) 
s?he enzyme that catalyzes the synthesis of CDP^iacylglycerol from CTP and phosphatidate (PA). CDP-d^cylgfycerol 
is an important branch point intermediate in both prokaryotic and eukaryotic organisms. CDS is a membrane-bound 
enzyme. A conserved region located in the C-terminal part has been selected as a signature pattern. 
- Consensuspattern:S-x-[LIVMF]-K-R-x(4)-K-D-x-[GSA]-x(2)-[LI]-[PG]-x-H-G-G-[LIVM]-x-D-R-[LIVMF]-D 

[ 1] Sparrow CP, Raetz C.R.H. 

J. Biol. Chem. 260:12084-12091(1985). 
[ 2] Shen H., Heacock P.N., Clancey C.J., Dowhan W. 

J. Biol. Chem. 271:789-795(1996). 
[ 3] Saito S., Goto K., Tonosaki A., Kondo H. 

J. Biol. Chem. 272:9503-9509(1997). 



W EP 1 033 405 A2 

, 0 , (Cy ,^a nS 0Cv«d^ 

erol-3-phosphate cytidylyltransferase. 

Number of members: 64 D ^ enhnrhnline Cvtidvlyltransferase: Insights into Regulatory Mechanisms and 

a domain is known to exist in the Wto^ P^*» : ; and cG dependent protein kinases (cAPK and cGPK^ 
Prokaryotic catabolite gene actuator protein (CAP). C c , eotjde ^ nding dom ain. The cAPK's are composed 

Both types o, kinases contains two tandem "P' 6 *^ 

of twodiflerentsubunits.acatalytic ,dw. ^^^^^^^^^^^r^^^>^ 
w^^MWudi^SJSS of beta-barre, 7: a threonine that is invariant in 
ol cAPK and cGPK is due to an amino acid n ^annels. Two such cations channels have 
cGPK is an alanine in most cAPK. - ^^Z^Z^oL visual signal transduction. It specifically binds 
been fully characterized. One is found ,n ^ depolarization of rod photoreceptors. In olfactory 

to cGMP leading to an opening of the channe ^ d th ^ «^»^ There are six invariant amino 

f h ^Lonserved G; ya = 

r „ Weber I T, Shabb J.B.. Corbin J.D. Biochemistry 28: 61 22-61 27(1 989). 



[03841 109. (cadherin) 

Cadherins extracellular repeated domain signature ca | C ium<le P endenl cell-cell adhesion. Cadherins 

C^ A wide number of tissue-specific forms of cadherins are known. 

- Epithelial (E-cadherin) (also known as uvomorulin or L-CAM) (CDH1 ). 

- Neural (N-cadherin) (CDH2). 

. Placental (P-cadherin) (CDH3). 

- Retinal (R-cadherin) (CDH4). 

- Vascular endothelial (VE-cadherm) (CDH5). 

- Kidney (K-cadherin) (CDH6). 

- Cadherin-8 (CDH8). 

- Osteoblast (OB-cadherin) (CDH11 ). 

- Brain(BR-cadherin)(CDH12). 

- T-cadherin (truncated cadherin) (CDH1 3). 
. Muscle (M-cadherin) (CDH14). 

. Liver-intestine (Ll-cadherin). 

- EP-cadherin. 

30 residues, than an extracellular domain ^• m ^T^ iminM be sub- d»ided Into five parte: there 
lions invorved in the interaction of plaque proteins: 
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- Desmoglein 1 (desmosomal glycoprotein I). 

- Desmoglein 2. 

- Desmoglein 3 (Pemphigus vulgaris antigen). 

in its extracellular domain. reDea ted domain is located in it the C-terminal extremity 

agines; these residues could be implicated in the binding ol c)ass detected by tne 

mere is a deletion of one residue after the second conserved Asp. 
[ UTakeichi M. Annu. Rev. Biochem. 59:237-252(1990). 
l!So^~ 

[0390] 110. Calreticulin family s nigh-capacitycalcium-binding protein which is present 
Calreticulin [1 ] (also known as calregu in CRP5 J £ 9 p the y sarcoplamic re ticulum (SR)membranes. 
in most tissues and located at the periphery of ^^'^^r and SB and it may well have other important 
« probably plays a role in the storage o calc, consisting* three domains: a) An N- 
functions. Structurally, calreticulin is a protein (NKjoma in); b) A central domain of about 70 

terminal, probably globular, domain £* < ^™^^S motif. This region binds calcium with a 
residues (P-domain) which contains three repeats of an acid c 17 ™™ (C -domain). This region 

^-capacity, but a high-affinity; %\°^J^^^^S^ related to the following proteins: - 
binds calcium with a high^city M * ^^^oll^n. but possesses a C-termina. domain rich in 
Onchocerca volvulus antigen RAL-1. RAL-1 is higniy e v De cted to bind calcium in that region. - Calnex.n 

,ysine and arginine and lacks ^acidic "seems 
[2]. A calcium-binding protein that interacts with newly J p f incorrect | y 1o |ded proteins. - Calmeg.n 

. o play a major role in the qualrty control apparatus of the ER by the retention « Y have 

Conssnsus patlem: |lv).x-D-x-[DENSTl-x(2)-K.P-[DEH]D.W-(DEN] 

| „ MichalakM.. Milne, RE, Bums K., °P» f^KS' Sci. 19:124-128(1994). 

7744-7749(1994). 

[039 1] 111. ^~Pe ^ catalyze tne re versible hydration of 

Carbonic anhydrases (EC±2_1T) (CA) V£M a °™. ™" anhydr ase are currently known to exist in 

* carbon dioxide. Eight enzymatic and evo« forms (CA-IV and CA-VII); a 

vertebrates: three cytosoUc isozymes (CA- 1, CA- I and CA H , tw acterjzed jsozyme [5 ].ln the alga 

mitochondrial form (CA-V); a secreted sa ^ ha ^ are periplasm, glycoproteins evolu- 

Chlamydomonas reinhardtii, two CA isozymes h««been ' ™*™££Lw U] also have a eukaryotic-type CA. 
tionary related to vertebrate 

r::^rr^^^ — ph — s (see 

<PDOC00323 >). u r, IVMFAK2) [The second H is a zinc ligand]- 

(see <PDOC00586 
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r 11 neutsch H F Int J. Biochem. 19:101-113(1987). 
Fern ey R T Trends Biochem. Sci. 13:356-359(1988). 
I] LhianR-EBioEssays 10:186-192(1989. 

(1998V 

[0392] 112. Caseins alpha/beta signature jnto ^ families; tne {irst consists of 

Caseins [1] are the major prote>n constant- of ^<^™££ and beta . casei ns. The alpha/beta caseins are a 

the kappa-caseins, and the second groups a cluster of phosphoryiated serine residues 

eight residues of the signal sequence. 

Consensus pattern: cWxW 

[1] Hort C Sawyer L. Protein Eng. 2:251-259(1988). 

[0393] 1 1 3. Catalase signatures decomposes hydrogen peroxide to mo- 

Catalase (EC 1.11.1.6 ) [1 .2.3] is an enzyme, present in , a aerob ic c ^ )n euRaryot|C 
"ecular ox^"^- Its main function is tc [^^^Z identical subunits. Each of the subunits 
organisms and in some prokaryotes '^^^^Sm proximal side Iigand. The region around th,s 
binds one protoheme IX group. A ^^^^f^o deludes a conserved arginine that participates in heme- 

around this residue has been selectee J^^^nSS] IV is the proximal heme-binding Iigand] 
Consensus pattern: ^^^SSfSS^^S«lWASIl [H is an active site residue] 

s [0394] 114. (chitin binding) Chitin recognition or protei ns that have a common binding 

A conserved domain of 43 am.no acids s ound n several lam an 9 H ^ ^ recognitjon Qf bind , ng 

specific^ for oligosaccharides of ^^^^ a number of non-leguminous plant lectins. The best 
ofchitin subunits. It has been found ,n the P^^^^eS^rm agglutinins (WGA-1 . 2 and 3). WGA • an 
characterized of these lectins are the three ^^^^ of a fourfold repetition of the 

» N-acetylglucosamine/N-acety.neuram.n.c -adbmdno M ^ h s J . rfic |ectin as we ,| as a rice lectin. - 
43 amino acid domain. The same type of suture is t Endcchttinases are enzymes that catalyze 
P,ants endochitinases (EC 12JJ4) ^ dass IA j s ^^^ s J cnitin . P | a nt chitinases function as a defense 
thehydrolysisofthebeta-Mlinl^gesofN-acty^ 
against chitin containing fungal pathogens. C '^^ 
« aitheir N-terminal extremity. An exception "J^^Jp^SkI in the latex of rubber trees. - Win1 and w,n2, 
two copies of the domain. - Hevem [5] a ^^^^ toxin alpha subunit [3]. The toxin encoded by 
two wound-induced proteins from potata alpha bel, and gamma. The gamma subunit harbors toxin 

, he linear plasmid P GKL1 is composed of three phaS e of the cell cycle; the alpha subunit, which m 

activity and inhibits growth of sensitive yeast 
so proteolytically prccessedfromalargerprecursorthatato 43 . residuedoma in directly follows the signal se- 

ss figure- + ++""' „ ,««~ "1111+--+ + conserved cysteine in- 

xxCgxxx)c<xxCxxxxCCsxxgxCgxx)oaCxxxCxxxxCI I 

volvedTadisulfide bond. position of the pattern. 
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. consensus pa„. m , l ™ °' S " " 

in chitin polymers. From the viewpoint of sequence s,m ar ty cnru ^ |(J Qr af£J enzymes 

fication ot glycosyl hydrolases [2.E1], O^^^^SS^ by destroying their chitin«ontaining cell 
from plants that function in the defense ^^%?£££ ( IB />, of a N-termina. chitin-binding domain 
wall. Class I A/I and IB/II enzymes drffer in the r enceW £ ra v abQut ^ tQ ^ amino aC(d 

(seethe relevant entry <PDQCQQ025>). The ^^^^s, tne first one is located in the N- 

which is probably invoked in a ^^^^.j^^^^^-tYH-^-F- [GSA] 

, 11 Flach J., Pilet P.-E., Jolles P. Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

T03961 116. chloroa_b-bind 

Chlorophyll A-B binding proteins. Number of members: 211 

[0397] 117.chromo conserved region of about 60 amino acids which 

The "chromo' (Chromatin Organ,zatK>n •"J^J^^ are proteins that modify the structure of chromatin 
was originally found in Drosophi a ™"Z2*^ where gene expression is repressed. 

a chromo domain seem to fall into three classes: 

b) Proteins with a single chromo domain. 

c) Proteins with paired tandem chromo domains. 

[0 398] Currents this domain has been found in the following proteins: 
[0399] Class A. 
, . Drosophila heterochromatin protein Su(var)205 (HP1 ). 
. Human heterochromatin protein HP1 alpha. 

5 [0400] Class B. 

- Drosophila protein Polycomb (Pc). 

: ISoptS 

naporthe grisea and CfT-1 from Cladosporium fulvum. 
. Fission yeast hypothetical protein SpAClMBflte 

- caenorhabditis elegans hypothetical protein C29H12.5 
55 - Caenorhabditis elegans hypothetical P rote.nZKl236 Q 2. 

. Caenorhabditis elegans hypothetical protein T09A5.8. 

[0401] Class C. 
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- Mammalian DNA-binding/helicase proteins CHD-1 to CHD-4. 

- Yeast protein CHD1. 

metal ion colactors. identical subunits. In eukaryotes, there are two isozymes 

" . consensus^: e - 1F VA H GA 1 .H-, 1 ,V W UHRKT,« 2 .-D- l PS 1 -R [H * an a*a s«a 
[M06] p, Kansas M., Branchaud B., Randan S.J. B*— y 29:22,3-22,9(1990). 
[0407] 119.clpA_B 

Number of members: 39 t ||s from extreme stress by controlling the 

are listed below. 

. <rt CPA. -nich ac, 8 as ,na ^ aubun. o, ,ha ATP-dapandan, p,o„a S . dp. 

. Neurospora heat shock protein hs P 98 

. Trypanosoma brucei protein dp. 

- Porphyra purpurea chloroplast encoded dpC. 

» Lconsaivadragiansoiabo^OOammoa^ 

A and B motifs tora ara many parta ,n nasal* £ dGmahi „ m6 te „ , ssiduss ,o ma Ctanainal 

motifs. 
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F0410] 120. cofilin_ADF 
Cofilin/tropomyosin-type actin-binding proteins 

[1] 

Medline: 97290449 

Nat Struct Biol 1997;4:366-369. 
[2] 

Leonard SA, Gittis AG, Petrella EC, Pollard TD, Lattman EE, 
Nat Struct Biol 1997;4:369-373. 

w 

Lappalainen P. Drubin DG; 

NaIut.l997;3aa7B-82. 
S.v.rs aain (ilaments and binds lo actin monamars. 

. s .D SS „in W „d S ,03^,napH-indepand e n, m a„ns,andpr.»ea 1S pc, m a„za«. 

. caenorhabditis elegans unc-60. 
o . Acanthamoeba castellanii «*ophorin. 
- Plants actin depolymerizing factor (ADr). 

Ka=aT=r^^ 

, M.. Popes., MaCv.rSK W.ada "■STSSfS.HW* „ 

0. Bibeb.mia.ry 3SB52S«a3(1MS)- Gen Gsnel . 242.346-357(1994). 

[5] Mor^n^^.^^d^awa^N.!^!®^^^ 0 ^^^ ^' ^' ^ 1em 2677240-7244(1992). 

h ■ mahH dehvdroqenase 24 Kd subunit signature Respiratory-chain 
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as a signature pattern. 

, 1) Hagan 01 Cut.. U>> l*™^ j eiochem. 197:563.576(1991). 

ro414l 122. copper-bind 

Sapper binding proteins, plastocyanin/azunn fam.iy 

Number of members: 70 bj d sjngle copp er atom and which are character- 

[0 415] Blue or "type-V copper protems >are ^^.^^^Zvnamam^^^^ 
zed by an intense electronic absorpfon band near 60 °"^ 

are the plant chloroplastic P^T^XS^ST^ ,am " y ° f pM *° ' M * 

. Amicyaninfromba^^ 

Za^^^^^^ 
. Blue copper protein from Alcaligenes laecalis. 
. cupredoxin (CPC) trom cucumber peel mgs [4] 

• 

35 . StSatyanin from the Japanese lacquer tree. 
. Umecyanin from horseradish roots. 

lost the ability to bind copper. 

^s:.S^B..K OT , B .H..^K..O e ^e,O., E ^^,-0- 2 6 9: ,.39,^ 
55 {IJLt. «— 1Y. VamanaKaT. FEBS Lett. 268:169-162(1991). 
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They seem to assist other polypeptides in maintaining or assuming conformations which permit their correct assembly 
into oliqomeric structures. They are found in abundance in prokaryotes, chloroplasts and mitochondria. Chaperonins 
form oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, known as ; cpn60 
(qroEL in bacteria) and a 10 Kd protein, known ascpnIO (groES in bacteria).The cpn10 protein binds to cpn60 in the 
presence of MgATP and suppresses the ATPase acth/ity of the latter. Cpn10 is a protein of about 100 ammo acid 
residues whose sequence is well conserved in bacteria, vertebrate mitochondriaand plants chloroplast [3,4]. Cpn10 
assembles as an heptamer that forms a dome[5]. As a signature pattern for cpn10, a region located in the N-terminal 
section of the protein was selected. ,^. ns 
Consensus pattern: [LIVMFY]-x-P-[ILT]-x-[DEN]-[KR]-lLIVMFA](3)-[KREQ]-x(8,9)-[SG]-x-[LIVMFY](3)- 
Note: this pattern is found twice in the plant chloroplast protein which consist of the tandem repeat of a cpn10 domain 

[ 1] Ellis R J., van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 
f 21 Zeilsta-Ryalls J„ Fayet O., Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991 ). 
3 Hartman D.J., Hoogenraad N.J., Condron R„ Hoj P.B. Proc. Natl. Acad. Sci. U.S.A. 89:3394-3398(1992). 
[ 4] Bertsch U., Soli J., Seetharam R., Viitanen P.V. Proc. Natl. Acad. Sci. U.S.A. 89:8696-8700(1992). 
[ 5] Hunt J.F., Weaver A.J., Landry S.J., Gierasch L, Deisenhofer J. Nature 379:37-45(1996). 

[0418] 124. Chaperonins cpn60 signature (cpn60_TCP1 ) 

Chaperonins [1 2] are proteins involved in the folding of proteins or the assembly of oligomeric protein complexes. 
Their role seems to be to assist other polypeptides to maintain or assume conformations which permit their correct 
assembly into oligomeric structures. They are found in abundance in prokaryotes, chloroplasts and mitochondria. Chap- 
eronins form oligomeric complexes and are composed of two different types of subunits: a 60 Kd prote.n, known as 
cpn60 (groEL in bacteria) and a 10 Kd protein, known as cpn10 (groES in bacteria).The cpn60 protein shows weak 
ATPase activity and is a highly conserved protein of about 550 to 580 amino acid residues which has been described 
by different names in different species: - Escherichia coli groEL protein, which is essential for the growth of the bacteria 
and the assembly of several bacteriophages. - Cyanobacterial groEL analogues. - Mycobacterium tuberculosis and 
leprae 65 Kd antigen, Coxiella burnetii heat shock protein B (gene htpB), Rickettsia tsutsugamushi major antigen 58, 
and Chlamydial 57 Kd hypersensitivity antigen (gene hypB). - Chloroplast RuBisCO subunit binding-protein alpha and 
beta chains which bind ribulose bisphosphate carboxylase small and large subunits and are implicated in the assembly 
of the enzyme oligomer. - Mammalian mitochondrial matrix protein P1 (mitonin or P60). - Yeast HSP60 protein, a 
mitochondrial assembly factor. As a signature pattern for these proteins, a rather well-conserved reg.on of twelve 
residues, located in the last third of the cpn60sequence was chosen. 
Consensus pattern: A-[AS]-x-[DEQ]-E-x(4)-G-G-[GA]- 

[ 1] Ellis R.J., van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 

[ 2] Zeilsta-Ryalls J., Fayet O., Georgopoulos C. Annu. Rev. Microbiol. 45:301-325(1991). 

[0419] Chaperonins TCP-1 signatures (cpn60_TCP1) 

The TCP-1 protein [1 ,2] (Tailless Complex Polypeptide 1 ) was first identified in mice where it is especially abundant in 
testis but present in all cell types. It has since been found and characterized in many other mammalian species, in 
Drosophila and in yeast. TCP-1 is a highly conserved protein of about 60 Kd (556 to 560 residues) which participates 
in a hetero-oligomeric900 Kd double-torus shaped particle [3] with 6 to 8 other different subunits. These subunits, the 
chaperonin containing TCP-1 (CCT) subunit beta, gamma.delta, epsilon, zeta and eta are evolutionary related to TCP- 
1 itself [4 51 The CCT is known to act as a molecular chaperone for tubulin, actin and probably some other proteins. 
[0420] The CCT subunits are highly related to archebacterial counterparts: - TF55 and TF56 [6], a molecular chap- 
erone from Sulfolobus shibatae. TF55 has ATPase actrvity, is known to bind unfolded polypeptides and forms a oligo- 
meric complex of two stacked nine-membered rings. - Thermosome [7], from Thermoplasma acidophilum. The ther- 
mosome is composed of two subunits (alpha and beta) and also seems to be a chaperone with ATPase activity. It 
forms an oligomeric complex of eight-membered rings. The TCP-1 family of proteins are weakly, but significantly [8], 
related to thecpn60/groEL chaperonin family (see <PDOC0026B>).As signature patterns of this family of chaperonins, 
three conserved regions located in the N-terminal domain were chosen. 
Consensus pattern: [REEL]-[ST]-x-[LMFY]-G-P-x-[GSA]-x-x-K-[LIVMF](2)- 

Consensus pattern: [LIVM]-[TS]-[NK]-D-[GA]-[AVNHK]-[TAV]-[LIVM](2)-x(2)-[LIVM]-x-[LIVM]-x-[SNH]-[PQH]- 
Consensus pattern: Q-[DEK]-x-x-[LIVMGTA]-[GA]-D-G-T- 



[ 1 ] Ellis J. Nature 358: 1 91 -1 92(1 992). 

[21 Nelson R.J., Craig E.A. Curr. Biol. 2:487-489(1992). 

[ 3] Lewis VA , Hynes G.M., Zheng D., Saibil H., Willison K.R. Nature 358:249-252(1992). 
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I 8) Hemminqsen S.M. Nature 367:650*50(1992). 

in 

^^^^ainsupe^s^byt.ecvc.ins, 

TF GiLon d TTThompson JD, Blocker A, Kouzarides T; 
Nucleic Acids Res 1994;22:946-952. 
[2] 

Medline: 96164440 

Mitchell E, Rasmussen B, Hunt T, Johnson LN; 
Structure. 1995;3:1235-1247. 
Complex of cyclin and cyclin dependant kinase. 
[31 

RussoAA, Jeffrey PD, Pavletich NP; 
Nat Struct Biol. 1996;3:696-700. 

noL and may be related but has no. been included. 

Number of members: 1 89 tjve ro|e in controlling nuclear cell division cycles^ 

this, a 32 residue partem has been derived. 

Bem . nxl 2H U V^X<2^^ 
« " FTO^-lLr^^*^ Ll ^' 1 ^ ' ^^^^ 
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- Type 1 cystatins (or stefins), molecules of about 1 00 amino acid residues with neither disulfide bonds nor carbo- 

- Typ^cySns, molecules of about 1 1 5 amino acid residues which contain one or two disulfide loops near their 
C-terminus. 

- Kininogens, which are multifunctional plasma glycoproteins. 

[0428] They are the precursor of the actfve peptide bradykinin and play a role in blood coagulation b> r helping to 
P^tlopTLyprekaflikreinandfactorXInexttofactor XII. They are also inhibitors of cysteme proteases. St uc^ 
SgensaTemadeofth^ 

which contains the sequence of bradykinin. The first of the three cystatin domains seems to have lost its mhMoy 

rS ' In all these inhibitors, there is a conserved region of five residues which has been proposed to be important 
for the binding to the cysteine proteases. The consensus pattern starts one residue before this conserved reg.on. 

- Consensus pattern: [GSTEQKRV]-Q- [ LIVT H VAF H SAGQ]-G-x-[LIVMNK]-x(2HLIVMFY]-x-[LIVMFYAl-[DEN- 
QKRHSIV] 

[1] Barrett A.J. Trends Biochem. Sci. 12:193-196(1987). 
[2] Rawlings N.D., Barrett A.J. J. Mol. Evol. 30:60-71(1990). 

[3] Turk V, Bode W. FEBS Lett. 285:213-219(1991). 7ftMQQ1 v 
[4] Lustigman S., Brotman B., HuimaT, Prince A.M. Mol. Biochem. Parasitol. 45:65-76(1991). 

[0430] 127. cytochrome_c (Cytochrome c) 
The Ram entry does not include all prosite members. 
The cytochrome 556 and cytochrome c' families are not included. 

EST CSnTbefonging to cytochrome c family [1 ]. the heme group is cova.ently attached by thioether bonds to 
wo cons rved cysteine residues. The consensus sequence for this site is Cys-X-X-Cys-H 1S and the histidine residue 
Hn of the two axial ligands of the heme iron. This arrangement is shared by all prote.ns known to belong to cyto- 
IZbcZS, , ih Presently includes cytochromesc, * d toc6, c550toc556, cc3/Hmc, cytochrome f and reaction 
center cytochrome c. 

- Consensus pattern: C-{CPWHF}-{CPWR}-C-H-{CFYW} 

[0432] [ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

r04331 128 (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

S Ly,glycerol(DAG)lasecondmessengerthatactsasa P roteinkinaseCactivator.Th l sdoma l n,sassumed 
SST ZSEZtZTSS nTokoy-a C, Tanabe T, Nature 1 990; 344: 345-348,2, Sakane F, Imai S, 

J van, Damme J, Gussow D, Ploegh HL, van Blrtterswijk WJ, van der, Bend RL, FEBS Lett 1990,275.151-158. [4] 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 
ro4361 129 (DAGKc) Diacylglycerol kinase catalytic domain (presumed) 

K SgS (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic doma.n 
i«i assumed from the findinq of bacterial homologues. 

F, Yam'ada K, Kanoh H, Yokoyama C, Tanabe T, Nature WW**^^™^ 
Kai M Wada I, Kanoh H, J Biol Chem 1 996;271 :8394-8401 . [3] Schaap D, de Widt J, van der WaU Vandekerckhove 
J van, Damme J, Gussow D, Ploegh HL, van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] 
Kanoh H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 

r04391 130. D-amino acid oxidases signature(DAO) _ Wati „„ rtl 
0440 D-amino acid oxidase (EC 1A3J3) (DAMOX or DAO) is an FAD flavoenzyme that catalyzes the oxidate .of 
neutri, and basic D-amino acids into their corresponding keto acids. DAOs have been characte "^ 3^^ 
in fungi and vertebrates where they are known to be located in the peroxisomes. 

(DASOX) [11 is an enzyme, structurally related to DAO, which catalyzes the same reaction but is active only toward 
duTahTox IcD-amino acids. In DAO, a conserved histidine has been shown [2] to be important for f tne enzyme scataryt, 
acttity The conserved region around this residue has been developed as a signature pattern for these enzymes. 
[M41] ' Consensus pattern: [LIVM](2)-H-[NHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A [H is a probable active site residue]o- 
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(1991). 



[0442] 131 . DEAD and DEAH box families ATP^ependent he ^ basis of their structura , simi - 
A number o« eukaryotic and P^^^ T^l^ P^l currently known to belong to 
larity. They all seem to be inv olv * " *TP<*jn* » a subunrt o{ a high molecular weight 
this famify are: - Initiation factor f^^J^^Z^^. I, is an ATP-dependent RNA-he.icase. 
complex involved in 5V»prec^ 

-PRP5andPRP28.Theseyeastprote,nsare,r^ RNA helicase, 

- P,10, a mouse protein S^^^^^SK^ « in p're-mRNA splicing and 
closely related to PI10. - SPP81/DED1 and DBP1 two yeast protein p y mitoch0 ndrial splicing, 

related to PI10. - Caenorhabditis elegans he., case 9™ - MSS ma yeast protein req ^ 
SPB4 a veast protein involved in the maturation of 25S ribosomai hna. poo * DroS ODhila 
ATPase an^DNA-helicase activ.es in vitro. It is «^ 
putatK,eRNAhe,icase related to P 68JB^ 

protein involved in ribosome assembly. - MAK5. a V 63 * P r ° l * ' hj|a otein importan , for oocyte formation and 
ROK1 , a yeast protein. - stel 3, a fission yeas protein _Vasa a Dro °P™* P Qtejn Qf unknown func . 

specification of embryonic postenor structures. - Me31 B, a £«^f^™^ P putatjve rnA helicase which can 
tion. - dbpA, an Escherichia col. putatrve RNA helicase. - d ea D an ^ ch ™ herj c P hja co|j tative RNA he |icase. - 
suppress a mutation in the 

rhIE, an Escherichia col, putatrve RNA he ^^ nE ™™^ ns hypothetical proteins T26G10.1 , 
activity. It probably interacts w,th 23S ribos oma RN A _ Cae n °^ aDart J J ejn YHR169w . . Fission yeast 
ZK512.2andZK686™ 

hypothetical protein SpAC31A2.07c. bacillus suuuns, yH i- ATP-bmd.ng 

curved sequence motif, Some of « 

proteins or by proteins belonging to the helicases supertamiiy j su bfamily which 

Represents a special version of the B motif of ATP-bind,ng proteins^ Pr ^ tejns currently M 

have His instead of the second Aspandarethus^ 

» to belong to this subfamily are: - PRP2, PRP1 6, PRP22 y my be involved in pre-mRN A splicing. 
ATP-requiringstepsof the pre-mRNA splicing process ^F, J^Jn^ of Xchmmo^ linked genes. - 

- Male-less (mle), a Drosophila protein required ^^J^^^ZfrngBa by UV light, bulky adducts or 
RAD3 from yeast. RAD3 is a ™ h *^^ proLn XPD (ERCC-2) are the 

cross-linking agents. ^^^^^^S^ chromosome transmission and normal cell cycle 

s homologsof RAD3 ^^^^f^S^ YKL078w. - Caenorhabdrtis elegans hypothetical 

putative RNA helicase. Signature patterns for 
o [0443] Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN -x-[LIVMFYGSTN 

entry <PDOC00017 
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^ tho hinwnthesis of lysine and of diaminopimelate. DHDPS is responsible for 
(gene dapA), the first reaction spec ^^^C^ »V "P^W-HJ mechanism in which pyruvate first binds to 
the condensation of aspartate semialdehyde and W™«w oy ap rv a structurally related to DHDPS and 

the enzyme byformingaSchif.^^ 

probably also act via a similar catalytic mechanism^ Esch ™ and ' ' e t0 form N . ace tylneuraminate. - 
nanA), which catalyzes the co—on ^^^^^^t^^ 3-o-methy,-scy,,o-,nosamine. 
Rhizobium meliloti protein mosA [3], "%™^*°^ rns for these enzymes were developed . The first one 
- Escherichia coli hypothet.cal I^V^^J^^SU proteins. The second signature contains a lysine 

[STAC] [K is involved in Schiff-base formation]- 

, 1] KanekoT., Hashimoto T., Ku^l a. ^V **^ ™^"™^ 

P«1 '» PHOd*«) ^-.^SIST*. too* «*> m tho do novo Orthosis o, 
DihydroonMo dshyd,o 9 anaso (EC V3_ai> » D ^; hass is . ublqutol s FAD (lavoproloin. In baotana 

pyrimidUe, tho cowerson ol ditiydfoo'dtao inlo orotato . H . sc „ e yea6 ts. such as |„ Saccha- 

S~ pyrD), DHOdeaso k locatsd on Ih. innar srio f he ^* ™^"",^ „ ne mi ,ochond,ia [1]. 

Lycos oora»isi.o (gano URM). it is a cytosolic prawn «* nofc ^° , 9S « ® £ Bic „ ml8 en . 

0452 135 (DMRL synthase) 6,7-dimethyl-8-ribityllumazine synthase 

C-5cytosine-specificDNAmethylases(EC2JJ^ P _ £ component of type II 

of cytosines in DNA [1,2,3]. Such e ^^^^^S^SuS enzymes recognize a specific DNA 
restriction-modification systems in prokaryo e ^nd some J^J^^ V ^ enzymes 

sequence where they ™^ tea ;^ 
5 thatrecognizethesamesequence.Thesequenc^ VP ma mmalian enzyme is 

Sn^ZTs^^^ 

a extro m «y in typo-ll snzyrne, x j FL |Vl-x(2WGSTCl-x.P-0.«2HFYWUM]-S [C is tho aotivo silo residual- 



(1991). 



[0455] 1 37. (DNAphoto^e) ™ J ^J^^^^ , 2] is . DNArepair enzyme. It binds to UV-dam- 
Deoxyribodipynmidme photolyase (EC 4~j~^ u j ° a n ear -ij\/ photon (300to 500 nm), breaks the cyclobutane 

^2 : ( =raT^ 
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c Jy in sifc terminal pari. Two ol these regions were selected as signature patterns. 
[ 11 Sancar G.B., Sancar A. Trends Bioshem. Sol. 12:259-261(1987). 

! ^iSSS,™ H.. Kohayash, T. Takao M., C*wa A. EK.BO , ,3:6,43-6,51 
(1994). 

[0456] (DN AphrttolyaBe2)DN Ai pl^^o^a^^ 3)^CW^^hc*oly^so) [1 ,2] is a DNA repair enzyme. It binds to UV-dam- 
Deoxyribodipyrimidme photolyase (EC ( D ™ P™ 0 '*' ' ' ' (300 10 500 nm) , brea ks the cyclobu- 

aged DNA containing pyrimidine dimers and = ^9-™ ^an Tyme that requires two choromophore 
tane ring joining the two 

cofactors for its activity: a reduced FADH2 ^ e ™% a ; \ u ™ 8 ' * flavin cnrom0 phore appears to function as an an- 
droxy-5-deazaflavin (8-HDF) derive (F420). The Ma t io [^^Z^ZerOn the basis of sequence 
tenna, while the FADH2 chromophore is though to be ^^^^^^^^Qrann^ 
similarities[3] DNA photolyases can be grouped '"^^^gf^^g^jgjy^ ha^o^U-mlfungfand plants. Class 1 enzymes 

— ZeVS 

terminal part. Two of these regions were selected a s Jure patterns 

Ss=^^ 

[ 1] Senear G B., Sancar A. Trends Biochem. Sci. 12:259-261 (1987). 
, S^rE™ 2 ^^ H , Kobayashi T, TaKao M„ Oikawa A. EMBO , 13:6143-6151 

Inlin e Ahmad M., Cashmore A.R. Plant J. 10:893-902(1996). 
[0458] 138. (DNA_pol_A) 

iTpotymerase .ami* A. The polymerases that belong to this fam.ly are listed below. 

Escherichia coli and various other bacterial polymerase I (gene polA). 
Thermus aquaticus Taq polymerase. 
Bacteriophage sp01 polymerase. 
Bacteriophage sp02 polymerase. 
Bacteriophage T5 polymerase. 
Bacteriophage 17 polymerase. 
Mycobacteriophage L5 polymerase. 
Yeast mitochondrial polymerase gamma (gene MIP1). 

■mofit B' [, I, is located in adomain when, in Eschonchiaeol. polA has wen sn 

to belong to this class detected by the pattern ALL. 

[ 1] Delarue M., Poch O., Todro N., Moras D., Argos P. Protein Eng. 3:461-467(1990). 
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[ 21 Ito J Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3] Braithwaite D.K., Ito J. Nucleic Acids Res. 21:787-802(1993). 

[0461] 139. DNA_pol_viral_C 
DNA polymerase (viral) C-terminal domain 
Number of members: 128 
[0462] 140. (DNAJopoisoll) 



DNA topoisomerase II signature 
DNA topoisomerase I (EC 5.99.1 .2) [1 ,2,3,4,E1] is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type II topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria prokaryote s eukaryotes and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits (the product of 
qenes 39 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunrts 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topoisomerase has been identified; it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunrts (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. e „ moraeo n Th fl 

[0463] There are many regions of sequence homology between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the following representation: 



--About- 1400-residues- 



[ Protein 39-* ][— -Protein 52 — ] Phage T4 

[ gyrB * ][ gyrA ] Prokaryote 11 

Archaebacteria 

[ parE *- — ][ parD ] Prokaryote IV 

[ * ] Eukaryote and 

ASF 

Position of the pattern. 

[0464] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapeptide was 
selected. The pattern is located in gyrB, in parE, and in protein 39 of phage T4 topoisomerase 
[0465] Consensus pattern[LIVMA]-x-E-G-[DN]-S-A-x-[STAG] Sequences known to belong to th.s class detected by 
the pattern ALL. 

[ 1] Sternglanz R. Curr. Opin. Cell Biol. 1:533-535(1990). 

[ 2] Bjornsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A., Mondragon A. Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J. Trends Biochem. Sci. 20:156-160(1995). 

K)4661 141 (DSPc) Tyrosine specific protein phosphatases signature and profiles 

Tyrosine specific protein phosphatases (EC 3.1.3.48 ) (PTPase) [1 to 5] are enzymes that catalyze the removal of a 
phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, pro- 
liferation, differentiation and transformation. Multiple forms of PTPase have been characterized and can be classified 
into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s)^The cur- 
rent* known PTPases are listed below: Soluble PTPases. - PTPN1 (PTP-1B). - PTPN2 (T-cel. 
PTPN3 (H1) and PTPN4 (MEG), enzymes that contain an N-terminal band 4.1- like domain (see <PDOC00566 > and 
could act at unctions between the membrane and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1 C; HCP; SHP) and 
PTPN11 (PTP-2C SH-PTP3 Syp), enzymes which contain two copies of the SH2 domain at its N-terminal extremity. 
The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. - PTPN7 (LC-PTP; Hematopoietic protein- 
tyrosTnTphosphatase; HePTP). - PTPN8 (70Z-PEP). - PTPN9 (MEG2). - PTPN12 (PTP-G1; PTP-P19). - Yeast PTP1. 
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dvd2 which play a role in inhibiting the onset of mitosis. - Fission yeasi pyt> vjru | ence plasmid PTPAses (gene 

S YeastCDCUwhich maybe « 

yopH). - AutographacaW^^ H Thr-183 and Tyr-185. - DUSP2 

Hp 'kinase phosphatase-1 ; MKP-1); wh.ch dephos gjj^^^ ERK2 on both Thr and Tyr residues. - 
(PAC-1), a nuclear enzyme that "P*"*^' ^ p ^ st1 MK P-3). - DUSP7 <Pyst2; MKP-X). - Yeast MSGS, 
DUSP3 (VHR). - DUSP4 (HVH2). - DUSP5 (H ™^ " D^^" . virus H1 PTPase; a dual specrf.crty 

a PTPase that dephosphorylates MAP kinase FUS3. - Yeast Yvm. d of a variab | e length extra- 

^hosphLe. Receptor PTPases. Structural,* * ' Some of the re- 
cellular domain, followed by a *>T™T^TZ£L immuTglobulMke domains, MAM domains orcarbonic 
ceptor PTPases contain fibronect.ntype II (F N ;»'> re ? ea1 f^JJS region generally contains two copies of the PT- 
anhydrase-Hke domains in their extracellular re, ^^^^^ is inactive but seems to affect substrate 
PAse domain. The first seems to have s he generally conserved but some other, presumably 

soecificitv of the first. In these doma.ns, the catalytic cysteine is g y PT Pases is shown: Extra- 

TpoS, residues are not. inthe following^ 

cellular Intracellular — — " ' 0 F £ ™ £ DLAR 3 9 0 0 2Droso P hila DPTP 2 2 0 0 2PTP-alpha 

2 0 0 2Leukocyte antigen related (LAR) 3 8 0 0 2 D«"P™b i ul RTP-epsilon 0 0 0 0 2PTP-kappa 1 4 

(LRP)00002PTP-beta016001PTP = 

01 2PTP-mu1 401 2PTP-zeta01 1 O 2 ^*^!^^^ Furthermore, a number of conserved 

[ 5] Hunter T. CeH ^mi 3-1016(1989).. 

[0468] 142. (DUF10) ^characterized protein ^^^SSS". of similarities: - Goat antigen UK114, a 
Z following uncharacterized proteins .have been ^^^^Lc acid soluble protein (PSP1 ). PSP1 [2] 
35 humanhomologandtheratcorresp^ 

may inhibit an initiation stage of ceNree protein ' "5^^^^^ protein Y«L051c. - Caenorhabdrt.s el- 
mosome V hypothetica. protein YER057C " Y ^ *"ZX«ca^ein ycdK. - Escherichia coli hypothetical pro- 
egans hypothetical protein C23G10.2. - f sc f^ *^ Haemophilus influenzae protein, 

tain yhaR - Escherichia coli hypothetical prote '"^^^ yabJ. - Haemophilus influenzae 

40 - Escherichia coli hypothetical protein yoaB - 

hyP otheticalproteinHI1627. ; Helicobacter py ^^^^Sti. - Rhizobium strain NGR234 symbiotic 
xanthus dfrA. - Synechocystis stra.n PCC 6803 hypothec P PH 0854.These are small proteins of 



[2]C 
(1995). 



contain 



two copies of the aligned region. 
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implicated in resistance to ethidium bromide. 



impllcaTea in resiaiani-o i« o....". 

I0475J 146. (DapB) Dihydrodipicolinate reductase »9"» lu ™ biosynthesis of diaminopimelic acid and 

Oihydrcdipicolinat. reductase (EC V3JJB estates me ^^^^^^linate. This en- 
iystae. me NAD « NADP^penden, ^I^^ST^ZZ^le besLnseryed region in mis 

ScSH^ 

EsHS===s= 

Si do have the typical he,ix-turn-he,ix motif of DMA "J^ffS 174144-154 
[0482] [1] Stutzman-Engwall KJ. Otten SL, Hutchinson CR, J Bactenol 1992,174.144 154. 

[0483] 149. (Desaturase) Fatty ^ ^^^^ the insert ion of a doub.e bond at the de.ta position 
Fatty acid desaturases (EC 1 .14.99.-) are enzymes ma y d , t0 be evolutionary 

of fatty acids. There seems to be twe , d^nct is a key regulatory enzyme 
related. Famir, 1 tec omposedc J^^^ 1 ^^ bo^tlTe de,ta(9) position of fatty acy.-CoA's 
of unsaturated fatty acid biosynthesis. SCD introduces a c«, u thouaht to function as a part of a 

multienzyme complex in the endoplasmic re ^™ selected hj jon js rich in histidine residues and in 

conserved region in the C-term,na. par ^^^^^^^ desaturase (EClJ4m6) [2], these 
aromatic residues. Family 2 is composed of. - Plan tBiMW ™ ™P steraovl-ACP to produce oleoyl-ACP. 

S^rrnrnta^ 
r^As 1 ^ 

1 11 Kaestner K H Ntambi J.M., Kelly T.J. Jr., Lane M.D. J- Biol. Chem. 264:14755-14761(1989). 
2 ShXj SomervilleC.R. Proc. Natl. Acad. Sci. ^.^88:2510-2514(1991). 
[ 3] Wada H., Gombos Z., Murata N. Nature 347:200-203(1990). 

[0485] 1 f D ^ r ~ r ° 1 ^^ 

Dihydroorotase (EC 35,23) (DHOase) cataryzes ine in i m droorotase binds a zinc ion which is required 

of ureidosuccinic acid (N^arbamoyl-L-aspartate ^J^^^M about 400 amino-acid residues (gene 
for its cata^ic activity [1], In bacteria ^^^^SS^ known as 'rudimentary- in Drosophila 

, «<o»»^^^-^|-^ '^^r^.ltaJtantaa.e.lnyeas.^ne 

5 ligands]- 

Consensus pattern: [GA]-[ST]-D-x-A-P-H-x(4)-K- 

[ 1] Brown D.C., Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
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[2] Davidson J.N., Chen K.C., Jamison RS.. ^"^^"p^jg^g ^ns7959^0(1989) 1 ^ 1 ^ ^^^^ 

3»» , r^^^^"■- 19 - la,B,lM,, " ,4,,, "* 

™ Chem ^12269.12276,1994)^ 

sentation: 

, + j N-terminal 1 1 



Gly-R 1 1 CXXCXGXG | C-terminal | + +-+ + + 

+ 

ii oc .he TRR' domain are also found in other prokaryotic and 
[0 489] lthasbeensh 0 wn[21thatthe-J-doma,naswellasthe CRR doma.na 

eukaryotic proteins which are listed below. 

a) Proteins containing both a 'J' and a 'CRR' domain: 

♦ • u4 c W mi which seems to be involved in mitochondrial protein import. 

' S!o^ 

- Yeast protein SCJ1 , involved in protein sort.ng. 

- Yeast protein XDJ1. 

. Plants dnaJ homologs (from leek and cucumber 
. Human HDJ2, a dnaJ homolog of unknown function. 

- Yeast hypothetical protein YNL077W. 

a) Proteins containing a M'domain without a 'CRR' domain: 
. Rhizobiumtred^ 

'. 5S 'protein SIS1 , required for nuclear migration dunng m,tos,s. 

Yeast protein CAJ1. 
. Yeast hypothetical protein YFR041 c. 
. Yeast hypothetical protein Yl R004w. 

, • SsSssssa ^ 

. Human HDJ1. 

. Human HSJ1, a neuronal protein. 

. Drosophila cysteine-string protein (csp). 

Consensus pattern: C- [DEGo rmn] i 
* m o», D M Lanse-T, DougKs M.G. ^«*^'^SS- W 
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5 

8940-8944 [2] Kim J, Johnson K, Chen HJ, oarron s^a 

r04961 1 54 Dynein light chain type 1 signature generating protein of eukaryot.c 

2a and flagella. The cytoplasmic isoform of dynem acts as ^rnotor subunjts , jnte rmed.ate Sl ze 

and o gane^s along microtubules. Dynein ,s comp^ 

04991 HI Cedergren-Zeppezauer ES, Ursson G "V™ q 

Sol CD, Harris JM, Mcintosh EM, Tame, A SM- ^minases zinc-binding region signature 
« [05001 156. (dCMP cyt deam) Cylidine and „ e nydro , ys is of cylidine into undme and am- 

mnnia while deoxycytidylatedeammase (EC 3 5 41 ^ Q .. . 91 These tw0 enZ ymes do not share any sequence 

phosphate into 5-amino-6-(r.bosylamino)-2 4(1H,3H P y nm Qf ^ antjblotlcs elastic^ S cy 

[0501] Consensus pattern. [CHHA^v] fc *w i 



H are zinc ligands 



10502] 157. Dehydrins signatures _ mDllr ience water-Braes Water-stress takes plaee when th. water 

An umber ol proteins are produced by plants that ^l"''^^" ack> ,aba) appears to modulate the response 
lSi».^<*<^^^J^SS?S^ - oanLd dehydrins ,,.2, « LEA 9 roup 2 

- Barley dehydrins B8, B9, B1 7, and B1 8. 

: — *- a 
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Wheartohydrin BAB 15 and cold-shock protein cor410, cs66 and cs!20. 



m D.n,dnnssna,aan umM ro,s M ,~^ 
cLlagk-n olac^tous^ 

boon tend in all known dehydnns so la, ««h the .xcopton ot pe ^ ^ o| fesld 

^ 9 ^Tsu?^:S^^ 
Cons.nsos patts-n: [KRHLIM1*-IDE1-KHUM]-P-G- 

12.475-486(1989). 

o [0505] 158.(deoR)Bact^ 

The many bacterial transcr,pt,on regulation pro e.ns wh <cn o j|ie9 oups the following prote.ns[1 ,2]^- 

into subfamilies on the basis o se ^ en *f opine catabo.ism and conjugal transfer. - agaR, 

accR, the Agrobacterium tumefaciens P lasm.d P T,C ^^ pr .^ S °3^ er f chia coll de oxyribose operon repressor. -fucR, 
the Escherichia co.i aga operon putatrve r^^^^Z^ ga ,actito, operon repressor. - gl P R, the 

>5 the Escherichia coliL-fuccse .^ 

Escherichia coli glycerol-3-phosphate regulon repressor. gutR (or s in, i „ |D the Bac.llus 

subtilis transcription regulator of the sigK ? e ™^^™Z^'nte\n . yjf Q P a n Escherichia coli hypothetical protein, 
col, hypothetical protein. - yihW, an ^^^^^^J^ motif of these proteins is located I ,n 

[ 2] Bairoch A. Unpublished observations (1993). 
[0507] 159. dsrm 

45 in humans, which is part of the cellular response to dsRNA. 
[0509] Number of members: 1 1 6 

[0510] 160. Dynamin family signature ofoducjnQ protein ot 100 Kd which is involved in the production of 

Dynamin [1 ,2] is a microtubule-asscoa te D V namin is StrUCtUra " y reWed * *" T 
microtubule bundles and wh.ch ,s able to b nd and hydjotyze fa y rjwphh cognate of mamma an 

so proteins: - Drosophila shibire protein (gene shi) [3] * _ vacuo|ar sorting pro te,n 

SLnln. It seems to provide the motor for ves^r ^ns ff£2S£Z**« motility. - Yeast protein MGM1 
VPS1 (or SP015) [4], a protein which could ^»""j£' n . ^° protein DNM 1 , which is invoked in endocytos 1S . 
[5], which is required for mitochondrial genome 7^^ e j ^ P the8i60 , a ,amilydclo8ei7 related prote.ns. 

of the ATP/GTP-binding motif 'A' (P-loop) (see <PDOC00017>).- 
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[05,1] consensus pao.m: L-fHnKHHSTNHaNHUVMHV-T* 

^^pS 

6 JoibTb A Fangnian W.L Genes Dev. 6:380.389(1992). 

11*; H., J. E. New Bio,. 272( , 993) 
j 7) siaeheli P., Pitosel F„ Pavlovio J. Trends Cell Biol. 3.268-272(199* 

[0514] 162. E1-E2 ATPases P hM P ho ^ at,or \ Slte tranQnnrt ATPases which , 0 rm an aspartyl phosphate intermediate 
EI-E^TPases (also known as P-type) are cat,« 

inthecourseof ATP hydrolysis / ^/ t ^^^^ / ^ (sodium pump) [reviewed in 5.6],Gastr.c 
membrane (H + ) ATPases [reviewed ,n * JSS5^fc»Wum pump) from the sarcoplasmic reticulum (SR), 
(K+ , H + ) ATPases (proton pump). - Calc.um ( Ca++ ) ATPases ( calc p P pump) whjch afe |n . 

ne endoplasmic reticulum (ER) and the plasma membrana Copper C _ ^ ATpases 

volvedintwohuman genetic disorders: o ^^^^ -^S^iium (Mg ++ ) ATPases. - A probable 
, - Bacterial cadmium efflux (Cd ++ ) ATPase* ^^^^J^SZLi meliloti, invotved in nrtrogen fixation^ 

5 ( „ Gre en N.M., McLennan 

9 Green N M Biochem. Soc. Trans. 17:970-972(1989). 
3 Fagan Si; Saier M.H. Jr. J. Mo,. Evol. 38:57-99(1994). 
! SeLoR. Biochim. Biophys. Acta 947,-2 £ V 

35 [0516] 163. E1_N 

E1 Protein, N terminal domain 
Number of members: 90 

40 ogUate dehydrogenase and 2-oxoisova.erate 

[05191 165. (ECH) Enoyl-CoA hydratase/.somerase signature isomerase(EC 5 3 3.8) (ECI) [2] are two en- 

S^Ahydra^^ 

zymes involved in fatty ac.d fatty acid oxidation to the 2-trans position. Most 

and ECI shifts the 3- double bond of the '^J™^^*™ |ocated in mitochondria and the other in perox.somes. 
45 eukaryotic cells have two fatty-acid beta ^ ld ^^ enzymes. Peroxisomes contain a 

in mitochondria, ECH and ECI are sep anje yet «n c ^ r ^ ™ both ECH and ECI activrty, and a C-term.na. 
Afunctional enzyme [3] consisting o f^^^^^. ,„ Escherichia coli (gene fadB) and Pseu- 
domain responsible for 3-hydroxyacyl-CoA d ^^f"™ Vmumunclional enzyme which contains both a HCDH and 
domonas f ragi (gene faoA), ECH and a /® ^^y^e^of other proteins nave been found to be evolutionary related 
so a3 -hydroxybutyry«-CoA epimerase ^^^^^Lbbb (EC42J^55) (crotonase). abacterial enzyme 
to the ECH/ECI enzymes or doma.n^- 3-hydroxbutyry k»JJJ syntn ^ e 7icTl3 i 36) (DHNA synthetase) (gene 
involved in the ^yrate/butano.-pr^ vtern^ DHNA synthetase converts 

menB) [5], a bacterial enzyme involved .n Jesynthesiscfl 1 * _ 4 -chlorobenzoate dehalogenase 
0 -succUl-benzoy.-CoA (OSB-COA) to 1,4-d^ 
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[LIVMFY]- 

[ „ MinamMshii N., TaKetani S„ Osumi T.. 
9 Mueller-Newen G., Stoffel W. Biol. Chem. Hoppe-Seyler 372.613 624(1 ». 

4 Nakahigashi K„ Inokuchi H. Nucleic Ac-ds Re, 1^^1990). 

ISSSiS" ^U^b"^^, M-, ScHoKen ,0., Chang ,-H.. Uang ,-H., 
Dunaway-Mariano D. Biochemistry 31:5594-5604(1 992V 

[0521] 166. (EF1BD) Elongation [ag^J^^oMhe f^p^epen^dent binding of aminoacyl-tRNAs to the ribos- 
Eukaryotic elongation factor 1 EM) ^ binds GTP and aminoacyl-tRNAs, the gamma 

omes[1]. EF-1 is composed of four subunfe the alpha ^^IrcomponenXsa^be^antid^or^) 
chain that probably plays a ro.e in anch 7 h f 6 the exchange of GDP bound to the 

chains. The beta and delta chains are ^^^T^^Z^ of around 23 to 31 Kd. Their C-terminal 

1050:241-247(1990). 
" [0523] 167 (EF1G_domain) Elongation factor 1 gamma, conserved domain 

S3 

Sgalion S2g, elongation factor 2 and some tetracycline res,stance prote,ns. 
' Siontc^S 

1 1] Aoki H., Adams S,L, Turner M.A., Ganoza M.C. B 10 ch,m>e 79.7-11(1997). 

[0528] 170. (EF TS) Elongation factor Ts signatures olnrination cvcle of Dro tein biosynthesis. It associates 

InproUteselongationfactorTs^ 

with the EF-Tu.GDP complex and induces the .exchange o GDP to , b T compon ent of the chloroplast 

45 Tu.GTP complex up to the GTP l^eTim a.gaVchloroplast [2]. It is also present in ^ 

^'consensus pattern: 
50 [0530] Consensus pattern: E-HIVMl-lNVl-lSC^-lQEl-T-D-F-V-lSAHKRN]- 

[ „ Bubunenko M.S.. Kireeva M L., Gudkov l™?"^"^" 1 * 
their cytoplasmic domains. Number of members. 30 
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[05 33] Paccaud JP, Thomas DY, Bergeron JJ, NilssonT, J Co., Bio. 1998:140:75! -765. 



172. ENV_poly protein 
ENV polyprotein (coat polyprotein) 
Number of members: 224 



Number of members: 224 A «.«, ha eic FRG4/ERG24 family signatures 

[0534] 173. (ERG4JERG24) Ergosteroi act by reducing double bonds in precursors of 

Two fungal enzymes invoked in ergosteroi reductase (gene ERG24 in budding 

ergosteroi have been shown to be ™*™**^ x ™ S uc a se (gene ERG4 in budding yeast and sts1 in fission 
yeast and erg3 in Neurospora Crassa , and lamin B receptor, which is thought to anchor the 

yeast)- Their sequences are ^^^J^^S hydrophobic and seem to contain seven or eight 
lamina to the inner nuclear membrane. These ^^ ^ ^ selected Tne first one is apparently * 

0535 Census pattern: G-x(2)-[L.VM H YHl-D^FYW^ 

nT^MH^rd £pS GT - KirSCh DR Gene 140:41 - 49(1 " 4) - 

[0539] 175. ER lumen protein ^^J^SSS^L (ER) contain aC-terminal tetrapeptide (generally K-D- 
Proteins that reside in the lumen of the ^^^ r J transport) , ro m subsequent compartments of the 
E-L or H-D-E-L) that serves as a signal for the.r re, ^ va, ^3 h t £ believed to cyc | e between the as side of 
secretory pathway. The signal is recognized by ^™^^* proXein retaining receptor or also as the 
the Golgi apparatus and the ER [H-This ^protein is eluding fungi (gene ERD2), plants, Plasmodium, 

'KDEL receptor'. It has been characterized .n a var et rec 9 eptor are known. Structurally, the receptor 

Drosophila and mammals. In mam rna.stwoh,g regions [2] . The N-terminal part (3 

is a protein of about 220 res.dues that seems to contain ^seven r cytopl asmic. There are three 

residues) is oriented toward the ^^J^^^^'l^ were developed. The first pattern 
lumenal and three cytoplasmic loops. Two signature ™ most P o{t he second transmembrane domain. 

SSJ- Consensus pattern: G-.-S-x- [ KR]-x-Q-x-L- [ FY]-x- [ LIV](2)-F-x(2)-R-Y- 
Consensus pattern: L-E-[SA]-V-A-I-[LM]-P-Q-L- 

genases. ETF transfers electrons to the mam resp rat0 ^ a ' n ^ * ^ |e of FAD per dime r. A similar system 
9 erodimer that consist of an alpha and a be* su Ta b out 8 Kd which is structurally related to the 
also exists in some bacteria. Th ° b fl SU * Un ^ 

bacterial nitrogen fixation protein f.xA which could play a role . a redo P' nypothe tical protein ygcR As a 

5 related proteins are: - Escherichia coli ^"^^fj^^c^^ was selected. 

r 11 Finocchiaro G Ikeda Y, Ito M.. Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990). 

[0543] 177. Endonuclease III signatures me ^ acts botn as a DNA N- 

Escherichia coli endonuclease III < EC j^ 9 ^^" '^s l apurinic/apyrimidinic (AP) endonuclease, introducing 
glycosylase, removing oxidized pynmid-nefrom^^^^ 



93 
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III (21 1 amino acids). - Micrococcus luteus ultravio let N ^^™"^ 'J , ^ )bacterium thermoformicicum [4]. Restric 
dimers.-ORFIOinplasmidpFVIof^^^ 

tion methylase rn.MB.TI. which is encoded by th s plasm*, ^^Jj™?. Yeast hypothetical protein YAL015c. - 
resulting in G-T mismatches. Th,s ^proteir .could eo ^^^ gM hypothetjcal pro tein R10E4.5. - Methano- 
Fission yeast hypothetical proem f^^J^^SSr t bound by four cysteines which are all located in 
coccus jannaschii hypothetcal P rote '^^ reg L is also present in the central section 

enzymes ' ^ ,qwkqqi p iKRAGLl-C-x(2)-C-x(5)-C [The four C's are 4Fe-4S ligands]- 

[LIVMFYW]-[GANK]- 

^KuoC-P.McRe. a. ,» 1 ,auOH-*,«. 0 -W-.IU...«-lA science 258 :434-». 

, SSTm SSSSCl AO. Wee.nbe,, 3. Chap.au M C. F,„ PA. Ho«ea HM. »«l «■» 
6294-6304. 

55 S" 5 isa^e, y o,exo„ U c l ea S ep,o,e,,^as rt — .T-*.,— — -» 

. SETS! £-* EV D.u.ache, MP, Nude* AclOs Pea ,993; 2 ,:25 2 ,. 252 2. 
[0551] 180. ENTH 

ENTH domain u . M «, H i Q nH r Fmr SD- Medline- 991 56083, Identification of a novel domain shared 

machinery. The function of the ENTH domain is unknown. 
[0554] Number of members: 29 
l0555 ] l81.(e<F-lA)Eukaryo^ 

Eukaryotic translation initiation factor 1 A elF l^ 1 ?^ djssocja tion into subunits and stabilizesthe binding of 

initiation of protein synthesis is not known.lt ^PP^.|** ^^.^ Hypusine is derived from lysine by the post-translational 
.obetheonly eukaryotic P« conta '^ of lysine. The hypusine group is essential 

so addition of a butylamino group (from ^^^^^3 in archaebacteria such as Sulfolobus acido- 
to the function of e.F-5A.A ^^^^^^^ 9 ^ and could play a similar role in protein 

#?^E^ — JWB MoL Ce,L Bio1 1131053114 

(1991). 
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[0560] 183. (efhand) S-100/ICaBP type ^^^SfiSL* in the brain. They have two different 
Uo are small dimeric acidic ^^^^^^^ a 'normal' EF-hand type high affinity site, 
types of calcium-binding sites: a low affinity one with aspeeal s£uctu tQ ^ |am||y Qf 

The vitamin-D dependent intestinal ^^^^J^^ J^^h«^ 
proteins, but it does not form d.mers. In the ast yea ^ he ^"^hese proteins is not yet known, although it is be- 

Kp^^ 

P 11; 42c; S100A10). - Calgranuhn M^^^^^^lh C. - (S100C). - Placental 

Calgranulin B (MIF related protein 14 (MRPJ 4 )- P 14 ' *?™»>: J jn . S100M) . . Pr0 , e in S-100D (S100A5). - 
calcium-binding protein (CAPL) ^^S^S^^si protein S-100P (S100E). - Psoriasin 
Protein S-100E (S100A3). - Protein S-100L (CAN ^ W _ P Thjs js a , arg0 interm ed,ate 

(S100A7). - Chemotactic cytok.ne CP-10 [5]. - Proto n MH ^ U ? nj> g s . 1Q0 type domain 

lilament-associatod protein that assoc^ 

in its N-terminal extremity. A number f^^^^hw. , os t their calcium-binding properties. A pattern 
[LIVMF] 

nlB ^ le , J .„„)0a fc IU^Ca^ M in 9P ^..G^0..B0l l ,U« e ,a,Ed 8 ..Pp10 2 . l ,3.Sp* Be , 
3 Kligman D.. Hit DO. Tr.nds BlocMa Sol ««W 25:63^43(1995). 

» i^rs n °c T ;S.^ 

(1993). 

E F-hand calcium-binding domain ^ of ca | C ium-binding domain 

Many calcium-binding proteins belong to J^^^SS residue loop ffanked on both side by a twelve 
,s known as the EF-hand [1 to 5] ^f*^ in a pentagonal bipyramida. configu- 

residue alpha-helical domain. In an EF-hand loop the ls these residues are denoted by X, 

ration. The six residues involved 

Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position i us prov ™ Qf tejn the total nu mber 

Listed be.ow are the proteins which are ^nown ^^^^^^.1^^^^^^ 

found in the S-100/ 

ICaBP family of proteins [6]. 

45 - Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). 

. Calcium-binding protein from Streptomyces erythraeus (Ca-3?). 

protein kinases (CDPK) from plants (Ca=4). 

- Calcium vector protein from amphoxius (Ca=2). 

- Calcyphosin (thyroid protein P 24) (Ca=4?). 

. calmodulin (Ca=4, except in yeast where Ca=3). 

ss . Calpain small and large chains (Ca=2). - Calretlnm (Ca=6). 

- Calcyclin (prolactin receptor associated protein) (Ca=2). 

- Caltractin (centrin) (Ca=2 or 4). f ^ tM ., m uoaef , Ca _ 2? } 

- cell Division Control protein 31 (gene CDC31 ) from yeast (Ca-2?). 
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binding protein ( ICaBP f)J? a = 2 ^ CFAG) and u (MRP -14) (Ca=2). 

: 

. osteonectin (basement membrane prote in ^O^ga P _ paivalbumins alpha and be ta (Ca=2). 

. serine/threonine protein phosphatase rdgc u=o 
- Spectrin alpha chain (Ca-2). 

. LTp2ons,(X,.3 ( Y)an612(-^^^ 

Qusnl. BM. 52:499-510(1987) 30 522-562(1990). 

I J "-^-VSSit " " 18:98.1030 99,,. 

• iiSHr^^srs-98,,98, 

* SSL A.. Co* ,A. FEBS M 269:454-458,1990,. 

» .^1 lUnVa dlmaric enzy™ .ha, ™3^™^2^ «J a.e m,.a dl«ar,n, *w**c ~ 

ss sequence.- ,wi(3VK-x-N-Q-l-G-[ST]-[LIVHSTHDEHSTA] 
105631 STSrS^^M ^ Bid. Chem 1 264:3685-3693(1989). 
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[0564] 185.(F*ctin_cap_^^ 

The F-actin capping protein binds in aca ^STSS^T^^^ se " erin ,hiS pr0tein ^ "* 
end) thereby blocking the exchange of ^^^^^i of two unrelated subunits: alpha and beta.The 
actin filaments. The F-actin sequence is we,, conserved in eukaryotic 

£££ " — — in the c — section 01 the a,pha subu 

[OsSr Consensus pattern: V-H-[FY](2)-E-D-G-N-V 
eton 18:204-214(1991). 

SV 'yS S. EE**. SJ. Ha„»r JW cen ,«7S1:»M19. 
[0569] 187. F-protein 

Negative factor, (F Protein) or Net Dumas C; Med , jne; 98035 457, The crystal 

[0570] [1] Arold S, Franken P, Strub M-P, Hon F Benicnou jn a(tered T ce|| 

structure o, HIV-1 Net protein bound to the Fyn kinase SH3 doma.n suggests 

receptor signalling Structure ^ 97 - 5 ]^ r ^. n Qf A|DS by jts jn , e raction with cellular proteins involved in signal 
[0572] Number of members: 1 01 3 

[0573] 188. (FAD_binding_2) bacteria two distinct, membrane-bound, enzyme 

Fumarate reductase / succinate dehydrogenase FAD-bjnd ^^ n j£j£ EC 1 3991): fumarate reductase (Frd) 
, complexes are responsible for the interc "^ growth. Both comp,exes consist 

is used in anaerobic growth, ^ "^^J£"^JS^ of a FAD-binding flavoprotein and an iron-sulfur 
of two main components: a ^^^^^^^or protein and/or a cytochrome B. 

f^teS 

o two subunits: a FAD flavoprotein and and iro ^ {Q whjch FAD is cova | en tly bound to a histidine 

[0575] The flavoprotein subunit ,s a protein of !^ 

ie S iduewhichis,ccatedintheN-term,n^« 

inFrdandSdhfromvanousb^ 

[0576] Consensus patternR-[ST]-H-lb I J-xi,^ m x u c i- 
js class detected by the pattern ALL. 

r 1]B ,autM.,Wh rt takerK.,^ 

^h-Machin^ 

G Turnbull D.M. J. Biol. Chem. 267:11553-11558(1992). 

[0577] 189. Fatty acid desaturases signatures ( FA -J»s e ) ^ ^ ^ posjtjon 

Fatty acid desaturases (EC 1.^ 

of fatty acids. There seems to be two distinct families o Many ac CD . a k re g U | a tory enzyme 

4£ related. Famiry 1 iscomposedof ^r^^^^^J^\Lm position of fatty acyl-CoAs 
of unsaturated fatty acd biosyn hes,s. SCD J^^^^, enzyme that is thought to function as a part of a 
such as palmitoleoyl- and oleoyl-CoA. SCD s a ™mb™et»una y ^ ^ ^ g 

muftienzyme complex in the endoplasmic reticulum ^^eS'^^ion is rich in histidine residues and in 
conserved region in the C-terminal part ol desaturase (EC 1_J4J39J>) [2], these 
50 aromatic residues. Family 2 is compose ^'^Spoi ° f aeraoyl-ACP to produce oleoy.-ACP. 
enzymes catalyze the introduction of a double atthe *J»)P^ unsaturated fatty acids in the synthesis of 
This enzyme is responsible for the conversion of saturate '^tty a ^ ds 1 ^ ns cjs do * b|e bond at the de lta(12) 
vegetable oils. - Cyanobacteria desA [3] ^^^^ n ci tolerance; the phase transition 
position of fatty acid bound to membranes ^^^^^ of unsatura tion of fatty acids of the mem- 

[O^r Consensus pattern: G-E-x-[FY]-H-N-[FY]-H-H-x-F-P-x-D-Y- 
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Census pane.: vwmmWMW)*-*^^"^- 

[ 3] Wada H„ Gombos Z., Murata N. Nature 347.200-203(1990). 

[05 79] 1 90. Fructose-1 -6-bisphosphatase active S '!^ r ^^^^i atorv en zyme in glaconeogenesis, catalyzes the hy- 
Fructose-1,6-bi SP hosphatase (EC 313J1.) (FBF^se) [1 ], a atory er^^rn g ^ ^ 
drolysis of fructose 1 ,6-bisphosphate to fructose 6-phosphata ^ s inv0 ^ e y 2] js an enzyme found plant 

£S in most organisms.Sedoheptu.ose-1 7-b.sp J^^^S^ 1 to sedo- 

chloroplast and in photosynthetic M«M ctf*z« J^Sihel. cycle. It is functionally and structurally 
heptulose 7-phosphate, a step in he Calvm s^duct^e pen H v ^ ^ ^ cata)ytjc mechanism 

r eLdtoFBPase.ln m amma|ianFBPasea. = 

[3] . The region around this the active site lysine is replaced by an arginine 

[ 1] Benkovic S.J.. DeMaine M.M. Adv. ^jn^»«W 205 :1 053-1 059(1 992). 

Ke^h= 

are: - L-fucolokinase (EC &LLSU " f'^B) L-xy.ul^sTi^ase (EC 2JJJ3) (gene lyxK).These 

, 2X130) (gone glpK). - Xylulose giZJ^^Sg- As c ^ sensus patterns for this famiry of kinases two 
enzymes are proteins of from 480 to 520 amino acmes. criminal section, 

conserved regionswere selected, one in the central «**wJ"J ™ | V MF]-x-[DENQTKR]- [ENQH]- 

[0581] Consensus pattern: IMFYGSWPeW^ 
Consensus pattern: [GS^^ 

• 53 iJKSi^^ 

FKBP [1 ,2,3] is the major high-affinity binding , protein ™ se) pPlase is an enzyme that accelerates 

peptidyl-proryl cis-trans isome rase activity ^^^Z midic pept ide bonds in oligopeptides [4].At leas 
protein folding by catalyzing the cis-trans «"™""™ «£™ _ ^p.^'which is cytosolic and inhibited by both 

35 Three different forms of FKBP are known ,n and Lbited by both FK506 and rapamycin. - FKBP- 

FK506and rapamycin. - FKBP-1 3, which ■""J^^ W are evolutionary related and show extensive 
25, which is preferential inhibited by ^^^^^Zo hsp binding immunophilin (HBI) (also ca. ed 
similarities[5,6,7] with the followmg proteins. -\ Fungal RTO Mamma p ^ ^ _ ^ ftrsi 

P 59). HBI is a protein which binds to hs P 90 and centals two FKBP l.ke g protejn assoc|ated 

40 whi hseemstobe functional. -TheC-termmal part or the e *^2*3^ V a protein wrth a N-terminal FKBP 
with macrophage infection by an unknown 5^f Br ^ E *£SrtSS fk P A. - Escherichia coli fklB (FKBP22). 
domain followed by an histidine-rich """"T^ and chrysoma.lus FK506-binding 

- Escherichia coli slpA, - Bacteria, trigger factor n^J pl ^3S^ strain C114 PPiase. - Probable 
protein. - Chlamydia trachomatis 27 Kd membra ^^^^ (MJ0278 and MJ0825), Pseudomonas 

4S PPiases from Haemophilus influenzae ^^^^^^^^^^^rsdmn^OnB*^ 

the complete domain. 2 V[LFTl-x(2)-G-x(3)-[DEHSTAEQKHSTAN]- 
GAQ]-x(2)-[AG]-[FY]-G- 

, Wach*. E- S.. 8«-*u«. " . Sc M « ,=C Ntt. 346*74,77^0). 
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rfilGalatA Eur J Biochem. 216:689-707(1993). 

genase. 

.■ oflSOto160residuesthatcontainthreetransmembranesegments.Asasignaturepattern, 
[05891 Consensus pattern: G-x(3)-F-E -R f™*^^j Biol . C hem. 271:22203-22210(1996). 

a dehydrogenase domain and a heme-b.nding doma^ ^ ^ oxjdase) , a peroxisorna | enzyme that cat- 
lactate into pyruvate. - Glycolate ox.das e ( EC J^^;XZgen peroxide: - Long chain alpha-hydroxy acd 
alyzes the conversion of glycolate and oxygen .EC 1.13.12.4 ) (lactate oxidase) 

oxidase from rat (EC 1^315), a peroxisomal enzyma - Lactate2 \ to acetate , carbon dioxide and 

25 fromMycobacteriumsmegmatis^^^ 

water. - (S)-mandelate dehydrogenase from Ps 7 h do ^^ n PU m ' e a c a ha ; jsm of tneS e enzymes is the abstraction of the 
m andelate to benzoy.formate. The first s ep „ , he can subsequently attach to the N5 atom 
proton from the alpha-carbon of the substrate P removal of the proton. The region around th,s 

of FMN. A conserved histidine has been t ^^^^Z' nMm which is involved in substrate bind.ng. 

* H D Lederer F J Biol. Chem. 266.20877-20680(1991). 

rmrai 195 Flavin-binding monooxygenase-like (FMO-like) 
40 S Th* famil" I indudes FMO proteins, cyclohexanone monooxygenase 
T05951 196 (FPGS) 

Folylpolyglutamate synthase signatures (a^^^ 

[05961 Folylpolyglutamate synthase (EC 6.3.2.17) pwan J 

dependent addition of glutamate moieties ^^^^ (gene folC) and eukaryotes. We developed two 
45 Sr"^^ 

50 sss to c^r P aS^ sequences known to be " 

long to this class detected by the pattern ALL. r p A&j £xp Med Biot , 338 :629-634 

[0600] 1 1] Shane B.,GarrowT., Brenner A, Chen L, Choi Y.J., msu 

(1993). 

[06011 197. FYVE zinc finger f d j Fab1 Y OTB/ZK632.12, Vad, 

ss oW] The FYVE zinc finger is named ^^^^ [1^ The FYVE finger has eight potential zinc 
and EEA1. The FYVE finger has been shown Jbrtttw ^ / wohisMines in a motif R + HHC + XCG, where 
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residues but are clearly related ^ ig96;271:24 048-24054. [2] Gaullier M, Si- 

[0604] 198. F_actin_cap_B 

[06071 ^'"^f^ J A. Natu.e 344:352-354,1990). 

[06081 [1 ] Amatruda J.F., Cannon J.F., Tatcneii nugo., ^ p 

[0609] 199. isopenicillin N synthetase ^J^^^SL of penicillin and cephalosporin. In the pres- 
isopenicllin N synthetase (IPNS) [1 2| "^"^^SrSm L-?alpha-aminoadipyl)-L-cysteinyl-d-valine 
ence of oxygen, it removes iron and ascorbate, teurl iydroge n atomsj i w amin0 -acid residues, 

to form the azetidinone and thiamine ^^^J^^tt,^ be involved in iron*inding and/or 
Two cysteines are consented in s a optional enzyme involved in cephalosporin 

substrate-binding. Cephalosporin "^'^^MBtatod to IPNS catalyzes the step from penicillin N to deac- 
biosynthesis. The DAOCS demon, which »T^^^S^Lm>o^ C. Streptomycesclavuligerus 

"a^ 
nz mes werTderived, centered around ,he conserved cysteine residues. 
[0610] Consensus pattern: [RK]-x-[STA]-x(2)-S-x-C-Y-[SL - 
Consensus pattern. [LIVM](2)-x-C-G-[STA]-x(2).[STAG]-x(2)-T-x.[DNG]- 

the first step of the processing of pre-rRNA. In 320 amjno acid reS idues. Structurally it consists 

RNAs [2]. Fibrillin is an extremely well conserved prote.n o I about 320 amino a 

o, three different domains: - An N-terminal domain of which resembles that of 

a number of dimethylated argmme res,du * s <^ consensus found in such proteins. - 

RNA-binding proteins and contains an octamenc sequence wnrtar to ne h ^ archaebacterja 

A C-terminal alpha-helica. domain. A prote.n ^' u ^ processing. It .acks the 

[ 1] Aris J.P., Blobel G. Proc. Nat,. Acad. Sci. U.SA 88:931-9* W 
2 Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev. 3.431-437(1989). 
[ 3] Agha-Amiri K. J. Bacteriol. 176:2124-2127(1994). 

, 3SS Tl B-ton C, Oriol R, .mberty A; Glycobiology 1998:8:87^ 

0618 203. 2Fe-2S ferredoxins, ^^^^^.J^ in a wide variety of metabolic reac- 
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in the toluene degradation operon (gene xylT) and '^^^mS^flie Afunctional to redoxin/ferredoxin 
putid, - Hypothetical (gene benC) from Acinetobacter 

reductase electron transfer component of the benzoate arcxyg h ( E)i 

calcoaceticus, the toluene 4-monooygenasecompe^ em S domain of phenol hydrox- 

and the ? .ene, ^^^Z^ZTp^ Th N™ a, domain of methane monooxygenase com- 
ylase protein P 5 (gene dmpP) rom Pseudomonas degradation pathway 

tions. Ferredoxinscan be dmded mto several suogroup , p ^J togeth0 r the following proteins that 

cluster(s) and according to sequence s m^ 

allbindasingle2Fe-2Siron-surfurcluster: - Mte ^^^^^ ^ is invo lved in cholesterol side 
which transfers e.ectrons electrons from putidaredoxin 

rcSo;^ 

capsular ferredoxin VI [5], which may transfer elect rons to a ye clus ter. Three of these 

. =eSSi:=^ 

J35" Consensus pattern: C^-^-x-tSTAMVl-C-^AH-C-^R] [The three C's are 2Fe-2S ligands]- 

^TCZS^ F- P-t P„ Vanderleyden ,, De Mot B. , Bacteriol. 1 77:676*87(1995). 
4 TaD.T.; Vickery L.E. J. Biol. Chem. 267:11120-11125(1992). o 22 -933-939(1994) 
[5] Naud I., vincon M., Garin J., Gaillard J., Forest E., Jouanneau Y. Eur. J. B>ochem. 222.933 939(1994). 
ts [ 6] Amemiya K EMBUGenbank: X51607. 
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,,99 5, W, enzy™ is closed o„«a subunits: A, B S »S 

, 9 „ e do,*.|,k. domains in » N-tarmina, pa o. .ha ^ -^^^^^^d-n*.. 
raduc<as..Th.B. U bun«on = ^^ 

,erredoxin-like domains and which probably b,nds twelve 4Fe-4S Salmons JP ^ 
protein which is predicted to carry two 4F 4S > cen ^ ^ n wjth a N . termina , regi0 n belonging to the 

f 11 Meyer J. Trends Ecol. Evol. 3:222-226(1988). 

2 Otaka E., Ooi T. J. Mol. Evol. 26:257-267(1987). 
[3] Beinert H. FASEB J. 4:2483-2492(1990). 

4 Huang C J., Barrett E.L. J. Bacterid. 173:1544-1553(1991). 
[ 5] Knaff D.B. Trends Biochem. Sci. 13:460-461(1988). 

[0625] 205. NifH/frxC ttarily responsible for biological nrtrogen fixation. Nitrogenase is an 
. Nitrogenase (EC 1.18.6.1) [1] is the enzyme system responsible » tor d g y 

5 ATP.A number of proteins are known to be evo ut.onary ^^^^'JSI ^'exact function is not known, 
(or chIL) protein l3].FrxC is encoded onme^ 

but rtcouldactas an electron earner intheconvers on f k pr °^ °^^ e jn ch | orophy || synthesis. There are a number 

of the 4Fe-4b cluster, i wo b 'H * [GAl-x-G-C-fAGl-G [C binds the iron-sulfur center]- 

r06261 Consensus pattern: E-x-G-G-P-x(2)-li3A] x u o lm^j « ^ 

Consensus pattern: D-x-L-G -D- V- V-C- G-G- F-[ AG]-x-P [C binds the ,ron-sulfur center]- 

ts [1]PauR.N. Trends Biochem. Sci. 14:183-186(1989). Rpfic;D c Science 257 1653-1659(1992). 

[J, Burke D.H Aiberti M.. Hearst J.E. J. Bacteriol. 175:2407-2413(1993). 

so [0627] 206. Ferritin iron-binding jns R consists of a minera l core of hydrated ferric oxide, 

Ferritin [1,2] is one of the major non-heme i on ^rage Pr^ n on environment. In 

and a multi-subunit protein shell wh,ch englobes the ^^^^f^^ (or closely related 

anima.s the protein is mainiy cytoplasmic and t ere are gently two or more g ne that enc ^ ^ ^ 

subunits (in mammals there are two subunits which ^^^^^^^jj^^r^.od^ 

55 chloro P last[3].Thereareanumbe^^ 

signature patterns were selected^ he J» ^■^^.^^ Thelecond pattern is located in the 
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;ar£E£?!£^^ IThe 3 E ' s a,e poten " a ' " on 

potential iron ligands]- 

[ 1] Crichton R.R., Charloteaux-Wauters M. Eur. J. Biochem. 164:485-506(1987). 

(1990). 

[0629] 207. intermediate <"^-»££ prjmordial cornponen ts of the cytoskeleton and the nuclear 

Intermediate filaments (IF) [1 ^1"* ^^g^^ 4 nm wide. IF proteins are members of a very large multigene 

envelope. They ^^^^^^^^^rouJ. Type I: Acidic cytokeratins. - Type II: Basic 
family of proteins which has been f^T^S^S^^ (GFAP), peripheric and plasticin. - Type IV: 
cytokeratins. - Type III: Vimentm, desmm glial fibrillary acidic P«™ <^ ■ P p A|| , F teins are 

Myofilaments L,H and M,al P ha-in— 

structurally similar in that they consist of. a ^^^^^om- a N-terminal non-helical domain (head) 
incoiled-coiledalpha-helic^ 

of variable length; and a C-term.nal doma.n (tail) ^ ® ^ d tn nave limited sequenC e 

1 11 Quinlan R., Hutchison C, Lane B. Protein Prof. 2:801 -952(1 995). 
2 Sterner P.M., Roop D.R. Annu. Rev. Biochem. 57:593-625(1988). 
[ 3] Stewart M. Curr. Opin. Cell Biol. 2:91-100(1990). 

one FMN molecule, which serves as a redox-activeprMineiic gro p eukar yotic algae. The signature 

1J Wakabayashi S.. Kimura K.. Matsubara H., Rogers LJ. Biochem. J 263.981 984(1989). 

0633] 209. Growth factor and cytokines hormone-related molecules have 

0 \ number o, receptors for lymphokines, are: - Cytokine 

been found [1 to 5] to share a common binding domain . Recepto ^nowruo » . Wne re tor 

receptor common beta chain. This chain ,s = ^ '^5 f^££. p Zo^o factor 
common gamma chain. This chain ,s commo tothe ^^^^^ factor re ceptor (G-CSFR). - 
receptor (CNTFR). - Erythropoietin receptor f ^ Jl (GM- CSFR) - lnterleukin-2 receptor beta 
<f Granulocyte-macrophage co>ony-st^ 

chain (IL2R-beta). - lnter.eukin-3 receptor ^» 1 euk n-7 eceptor alpha chain (IL7R). - Interleukin- 

5 receptor alpha chain (IL5R). nte rl eu kin -6 receptor ^ g !" (prlR). - Thrombopoeitin receptor (TPOR). 

^captor (IL9R).™ 

The conserved region constitutes all or pan 01 me wui a hormone receptor, 

so residues long. In the N-terminal of this domain there are two P* re £^^ , C CCC Extracel- 

to be involved in disulfide bonds. + yyXXXXX +IHI Transmembrane +- 

lular XXXXXXX Cytoplasmic — M ZhtL fi«t one is derived from the first N-terminal disulfide 

Consensus pattern: [STGL]-x-W-[SG]-x-W-S- 

[ 1] Bazan J.F. Biochem. Biophys. Res. Commun. 164:788-795(1989). 
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[ 2] Bazan J.F. Proc. Natl. Acad. Sci. U.S.A. 87:6934-6938(1990). 

[ 3] Cosman D.. Lyman S.D., Idzerda R.L., Beckmann M.P., Park L.S., Goodwin R.G., March C.J. Trends Biochem. 
Sci. 15:265-270(1990). 

[4] d' Andrea A.D., Fasman G.D., Lodish H.F. Cell 58:1023-1024(1989). 
s [ 5] d" Andrea A.D., Fasman G.D., Lodish H.F. Curr. Opin. Cell Biol. 2:648-651(1990). 

[0635] 210. Phosphoribosylglycinamide formyltransferase active site (formyljransf) 

Phosphoribosylglycinamide formyltransferase (EC 2.1.2.2 ) (GART) [1] catalyzes the third step in de novo purine bio- 
synthesis, the transfer of a formyl group to 5'-phosphoribosylglycinamide. In higher eukaryotes, GART is part of a 

10 multifunctional enzyme polypeptide that catalyzes three of the steps of purine biosynthesis. In bacteria, plants and 
yeast, GART is a monof unctional protein of about 200 amino-acid residues. In the Escherichia coli enzyme, an aspartic 
acid residue has been shown to be involved in the catalytic mechanism. The region around this active site residue is 
well conserved in GART from prokaryotic and eukaryotic sources and can be used as a signature pattern. Mammalian 
formyltetrahydrofolate dehydrogenase (EC 1.5.1.6 ) [2] is a cytosolicenzyme responsible for the NADP-dependent de- 

J5 carboxylative reduction of 1 0-formyltetrahydrofolate into tetrahydrofolate. It is a protein of about 900 amino acids con- 
sisting of three domains; the N-terminal domain (200 residues) is structurally related to GARTs. Escherichia coli me- 
thionyl-tRNA formyltransferase (EC 2.1.2.9 ) (gene fmt) [3]is the enzyme responsible for modifying the free amino group 
of the aminoacylmoiety of methionyl-A(f Met). The central part of fmt seems to be evolutionary related to GARTs active 
site region. 

20 [0636] Consensus pattern: G-x-[STM]-[IVT]-x-[FYWVQ]-[VMAT]-x-[DEVM]-x-tLIVMY]-D-x-G- x(2)-[LIVT]-x(6)- 
[LIVM] [D is the active site residue] - 

[ 1] Inglese J., Smith J.M., Benkovic S.J. Biochemistry 29:6678-6687(1990). 
[ 2] Cook R.J., Lloyd R.S., Wagner C. J. Biol. Chem. 266:4965-4973(1991). 
25 [ 3] Guillon J.-M., Mechulam Y., Schmitter J.-M., Blanquet S., Fayat G. J. Bacteriol. 174:4294-4301(1992). 

[0637] 211 . G10 protein signatures 

A Xenopus protein known as G10 [1] has been found to be highly conserved in a wide range of eukaryotic species. 
The function of G10 is still unknown. G10 is a protein of about 17 to 18 Kd (143 to 157 residues) which is hydrophilic 
30 and whose C-terminal half is rich in cysteines and could be involved in metal-binding. As signature patterns, two of 
these cysteine-rich segments were selected. 

[0638] Consensus pattern: L-C-C-x-[KR]-C-x(4)-[DE]-x-N-x(4)-C-x-C-R-V-P- 
Consensus pattern: C-x-H-C-G-C-[KRH]-G-C-[SA]- 

[0639] [ 1] McGrew L.L., Dworkin-Rastl E„ Dworkin M.B., Richter J.D. Genes Dev. 3:803-815(1989). 
35 [0640] 212. G-protein alpha subunit 

[0641] G proteins couple receptors of extracellular signals to intracellular signaling pathways. The G protein alpha 
subunit binds guanyl nucleotide and is a weak GTPase. Number of members: 195 

[1] Coleman DE, Berghuis AM, Lee E, Under ME, Gilman AG, Sprang SR, Science 1994;265:1405-1412. 
40 [2] How G proteins work: a continuing story. Coleman DE, Sprang SR, Trends Biochem Sci 1 996;21:41-44. 

[0642] 213. Glucose-6-phosphate dehydrogenase active site (G6PD) 

Glucose-6-phosphate dehydrogenase (EC 1.1.1.49 ) (G6PD) [1] catalyzes the first step in the pentose pathway, the 
reduction of glucose-6-phosphate to gluconolactone 6-phosphate. A lysine residue has been identified as are active 
45 nucleophile associated with the activity of the enzyme. The sequence around this lysine is totally conserved from 
bacterial to mammalian G6PD's and can be used as a signature pattern 
[0643] Consensus pattern: D-H-Y-L-G-K-[EQK] [K is the active site residue]- 

[0644] [ 1] Jeffery J., Persson B., Wood I., Bergman T, Jeffery R., Joernvall H. Eur. J. Biochem. 212:41-49(1993). 
[0645] 21 4. GATA-type zinc finger domain 
so The GATA family of transcription factors are proteins that bind to DNA sites with the consensus sequence (A/T)GATA 
(A/G), found within the regulatory region of a number of genes. Proteins currently known to belong to this family are: 

- GATA-1 [1 ] (also known as Eryf 1 , GF-1 or NF-E1 ), which binds to the GATA region of globin genes and other genes 
expressed in erythroid cells. It is a transcriptional activator which probably serves as a general 'switch' factor for eryth- 
roid development. - GATA-2 [2], a transcriptional activator which regulates endothelin-1 gene expression in endothelial 

ss cells. - GATA-3 [3], a transcriptional activator which binds to the enhancer of the T-cell receptor alpha and delta genes. 

- GATA-4 [4], a transcriptional activator expressed in endodermally derived tissues and heart. - Drosophila protein 
pannier (or DGATAa) (gene pnr) which acts as a repressor of the achaete-scute complex (as-c). - Bombyx mori BCFI 
[5], which regulates the expression of chorion genes. - Caenorhabditis elegans elt-1 and elt-2, transcriptional activators 
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e fm Ustilaqo maydis urbsl [7], a protein involved 

of genes containing the GATA region ™' u ^ 
in the repression of thebiosynthes.s of side^ 

apairolhighly sirnilar'zincfinge^ tactors . These proteins are: - Dro- 

contain a single zinc finger motrf h,ghly related to those « ™£J wnjcn may tuncti on as a transactional 

sophi a box Adding factor (ABF) (also known body. - Emericel.a nidulans areA [8], a tran- 

slator protein and may play a key role .n the «W^Son - Neurospora crassa nit-2 [9], a transcriptional 
s iptional activator which mediates nitrogen metaboMe J iQt the use of a variety of secondary 

ac Lor which turns on the expression of genes coding <» W£ J ^ ^ protejns 1 and 2 (W C-1 and 
nt>gen sources, during conditions of mtrogen tanttm - Neurospora dal8i a negal , ve 

ifea^^ 

zinc ligands] 

, „ Trainor CD., Evans T., Felsenfe.d G., Boguski M-S^^-MO^ 

S M.E., Temizer D.T., Clifford J.A Quertermo, is T J M . EMBO j. 10:1187-1192(1991). 

6 Hawkins M.G., McGhee JX). 1 PrM^m ^^^g^ l uiol , 13:70 91 -7100(1 993). 

[^Fu Y.-H.. Marziuf G.A. MoL Cell. Biol. 10.1056-1065(1990). 

[06 47] 215. Glutamine amidotransferases •^^^^^m^m^ft^'^ 
A large group of biosynthetic enzymes are able to catalyze ^he >re & ^ js Rnown asglutamine 

t^ uansL thl group to a substra^ to form a new^ 
, amidotransferase (GATase) (EC 2 .4.2 ,) [ 1 JJJ^^JJL domain. On the basis of sequence similarrt.es two 
as part of a larger polypeptide fused in drfferen ways to a syntha ^ ^ ^ as 

classes of GATase domains have been ^f-^^Xng enz V mes: " ThS SeC ° nd Comp ° nent ° ™ 
purF-type). Class-I GATase domains have been fount in the to owing J f rom cnorismate and glutamine. 

n ate s^ anthranilate using ammonia rather than 

s AS is generally a dimeric enzyme: the ^^StSS In some bacteria and in fungi the GATase component 
giutamfne, whereas component II P™***^"™^ steps of the biosynthesis of tryptophan. - The second 
of AS is part of a multifunctional protein '3. .,, a dimeric prokaryotic enzyme that function 

component of 4-amino-4-deoxychor,smate (ADC synthase <EC 4 ^ p chor ismate and glutamine. The 

nThe pathway that catalyzes the (EC6 1 3A2). CTP synthase catalyzes 

(0 sec OT dcom P onent(genepabA)prov,des^ 

the final reaction in the biosynthesis^ pyrim.dme the .AT* aepe . n jg jn ^ c . termina , section 

syn base is a single chain enzyme^t <^£^£%S£L catalyzes the ATP-dependent formation o 
[I]. - GMP synthase (glutamine-hydrolyz.ng (EC J^jJ™ fe a sjng|e chain enzyme that contains two distinct 
GMP from xanthosine S'-phosphate and glutamine. GMP synth £^ ^pendent carbam0 y|.phosphate synthase 
45 domains; the GATase domain is in the biosynthesis and which catalyzes the 

(EC 6_3JL§) (GD-CPSase); an enzyme invoke n both JJJJJ^^ dioxide . ,„ bac teria GD-CPSase ,s com- 
ATP^dentformation of carbamoyl f^^S^^m activity, while the small chain (gene carA) 
posed of two subunits. the large chain (gene bjosyntheS is is also composed of two subunits^ 
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[06481 .... 
active site residue] 



l „ Buchanan JM Adv. Enzymol. 
[ 7] Tso J.Y., Hermodson M.A., ^aiwn 

■ wmmMmmm 

[ 2] Weng M., Zalkm H. J. Bacteno i 9 r q . 97 90-9798(1984). 

2 7 1 39) - Mevalonate kinase (EC 27^3^ ™ 

Con S .ns U s pattern: [LIVMHPK1 x 1 05 '«! I • „ ;62M 3,(1991). 

yS^^^"^ 

ss ol the protein. f p T1 . x . Y .C-P-S-[LIVM]-E-x-K-[UVM]-x-lKR]- 
[0 659] Consensuspattem^GSHPTx^ 
Consensus pattern: A-G-Q-x-lin j o v 
[0660] 221. (GLFV.dehydrog) 
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yeast GluDH which as Leu. ^ 
[ 31 Naaata S., Tanizawa K., tsaKi in., =>a^ 

. ^ — 



<aq\ IPC A 1 3 271 (41 AS catalyzes the biosynthesis of anthra- 
. The second component of anthran.late synthase (AS) (EC 4. 1 .3.27 [4]. y ^ ^ ^ ^ 

nilate from chorismate and glutamine. A f '^^^^S II provides the GATase activity. In some 

steps of the biosynthesis of tryptophan. svn thase (EC 4 1 .3. -). a dimeric prokaryotic enzyme 

gMan**. Th. second component (gene pabA I ^^^^'S,^ o, pyhmidine. .he ATP- 

distinc. drains; .he GATase domain « ir . .h* ' "^J^i e catalyzes .he ATP-dopenden. torniation ol 

distinct domains; the GATase domain is in the N «^' seCt '° n 5 ^ (GD . C PSase); an enzyme involved in both 
. Giutamine-dependentcarbamoyUphosphatesyn^ 

arginine and pyridine btos y nthes,s and which "Jg^^JJ* of two sub units: the large chain (gene 
,rom glutamine and carbon diox.de. In bacter, ^^^J provides the GATase activity. In yeast the 
carB) provides the CPSase actrvity wh.le CPA1 (GATase), and CPA2 (CPSase). 

enzyme involved in arginine biosynthesis a so ^^^^2^^ by a large multifunctional enzyme 
Inmosteukaryotes, the first three steps ofpynmid^ 
(called URA2 in yeast, rudimentary in Drosoph.la, and CAD in mammals). 

. - ST L*e am W o,_ h,sH, an en„me M ca,a„zes *e » s,ep ,„ « ^synthesis o, hi-idine ,„ 
prokaryotes. 

it is replaced by Ala. 

[ 1] Buchanan J.M. Adv. Enzymol. ^^^l^ 
m Wena M Zalkin H. J. Bacteriol. 169:3023-3028(1987). 
3! NyunoyaH Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] Crawford LP. Annu. Rev. M^. 4 ™^ AVsirtlh J.M. J. Biol. Chem. 260:3350-3354(1985). 

[0671] 224. GiM^^^^^ Sts^nlnzyme that catalyzes the reduction of hydroxyperoxides 
so Glutathione peroxidase (EC 1JL1S) (GS HP x) [ 2 .s an. y end nously for med hydroxyperoxides. 

by g,utathione. Its main function is to P^ct against e d JJJ«^ form (GSHP x-1 ), a gas- 

In higher vertebrates at least four forms of GSHPx are .known o * epididymal secretory form 

troinLtina. cytosolic for (GSHPx-GI) [3] a p k™£<^ fc jm^SHPx^U ^ functjon [5] hgs been 
(GSHPx-EP). in addition to these «^^^*i^S,«le parasites such as Brugia pahangi the major 

X^a'p.^ 
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=S5Sb==^ 



site selenocysteine residue] 
Consensus pattern: [LIV]-[AGD]-F-P-[CSHNG]-Q- 



[ 1] Mannervik B. Meth. ^^i J^B^^G^Tainer J.A., Hallewell R A. Protein Eng. 2:239-246(1988). 

[ 7] Stadtman T.C. Anna Rev. Biochem. 59.111-127(1990). 
[0673] 225. (GST) 

Glutathione S-transferases f t Also inC | U ded in the alignment, but are 

10674] Function: conjugation * "^^^J^^ Toted. Eukaryotic elongation factors 1-gamma. 
not GSTs S-crystallins from ^ S> f^"^™ Supported by HMM and manual alignmen .n- 

Not known to have GST activity; s.m.lar.ty '^jr^^^ lBt 5p r oWni in plants and stringent starvation 

alignment inspection. Alignment spans entire protein. 

r0675] 226. GTP1/OBG family signature rharacterized [1 21. This family currently includes: - 

Kdespread family of GTP-binding proteins ^^ZZ^X^U " * eaSt ^ *K " 

Mouse and Xenopus protein DRG. - Human protein clusten . Bacillus subtilis protein obg. 

A Halobacterium cutirubrum Lh^^ 

Obghasbeenex P erimentallyshowntob,ndGTP.-E« _ ^ hypothetlcal prote ,n 

hypothetical protein HI0877. - ^op asma gemta -- 

YAL036C (FUN11). - Yeast hypotheteal protein YGR173w_ ^ polypeptides of about 40 to 48 Kd wh,ch 

.unction of the proteins that belong to ^^^JJ^^Vm * ^ *• re9 '° n 

contain the five sma^ 

^SazukaTTor.okaY.lkawaY^ 
[0677] 227. (GTP_EFTU1) 

ATP/GTP-binding site motif A (P-loop) been shown (1 ,2,3,4,5,6] that an ap- 

[0678] From sequence compares "J^ffJ^JSr of more or less conserved sequence motrfs^ 

s preciable proportion of proteins that ^^^^^^^tomsaflflxble loop between a beta-strand 
The best conserved of these motifs ,sa 9^*^ J^Sha^gtoupe of the nucleotide. This sequence motif .s 
and an alpha-helix. This loop interacts wrth one of the , P Jhere are numer0 us ATP- or GTP-b,nd.ng 

generally referred to as the 'A' consensus sequence [1 ] ^^J^ln families for which the relevance of the 
proteins in which the P-loop is found^ Listed be o» ^^^^ (see <PDOC00 137>). - Myosin heavy 

-,o presence of such motif has been noted: - f^^jjj "p DOC 00343». - Dynamins and dynamin-like prote.ns 
chains. - Kinesin heavy chains and tones*. -I k PjSSSSSo^^idfcie kinase (see <PDOC00524». - Thymi- 
(see <PDOC00362>). - Guanylate kinase (see <EDQ™2>)_ " JY Nitroge nase iron protein family (nrfH/ 

llLTi^e^<PDOCT^>).-Shik^ BC transport ers) [7] (see 

S5 (see <^^^"lTf^T^X e ngiion factors (EF-Tu, EF-1a, P ha, 
55 .PnOC00l85 >). - DNA and RNA helicases [8,9,10]. OTP binding g _ Nuc|ear protejn ran (see 

2, etc.). - Ras family of GTP-bind,ng protein s R f ' 

S^X:^^^^^^ (see ■ Guanine nuc 
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u , ,m rc fit GO etc ) - DNA mismatch repair proteins mutS family (See 
binding proteins alpha subunits (Qi, Gs, Qt. OO, eic_ < p fi oco056Z>).Not all ATP-or GTP-b.nding pro- 
<PDrio038B». - Bacterialtype II secretion ^^^J^SSbSSSi »• structure of their ATP-binding 
te-^irTplcIed-up by this mot* A number ^T^^^Mm are the E1 -E2 ATPases or the glycolytic 
site is completely different from that of the ^^^^^ in a slightly different form; this is the case for 

=~=£s=S£==sr. 

. consensus pattern: [AG]-x(4)-G-K-[ST]- 

r 11 Walker J.E., Saraste M., Runswick M.J.. Gay N.J. EMBO J. 1:945-951(1982). 
2]MollerW.,AmonsR.FEBSLett. 186:1-7(1985). USA 83'907-911(1986). 

3 Fry D.C., Kuby S.A., Miidvan AS. Proc .HA Acadia U.SA OX 911(19^) 

4 DeverT.E., Glynias M.J., Merrick W.C. Proc. NalKAcad ScUJ 

[5] Saraste M„ Sibbald PR., Wittinghofer A. Trends B.ochem. So. 15.430 434(1990). 

| 7 !S^ J C si.™MS 9 SLdi ,, Gil, D.R., Gallagher M.P , Bioenerg. Biomembr. 22: 

[!^rba,enyaA.E.,KooninE,.,Do 

[0679] GTP-binding elongation ^^^"j^^pMe chains in protein biosynthesis. In both prokary- 
i Elongationfactors[1,2]are P rotein^^^ as descrjbed in the fol.owing table: 
otes and eukaryotes. there are three distinct types of elongation p rokaryote s Function 
Z e'f-1 alpha EF-Tu Binds GTP and an aminoacyl-tRN A; deliv- 

.0 the regeneration of GTP-EF-1a. EF-2 EF-G B.nds QT*^W^ e GTp . bjnd|ng elongation fac tor family also 

to the P site. """"" " "' . r _, pase factor GTP-binding subunits [3]. These proteins 

includes the following proteins: - ^^^^Zm^ stop codon at their decoding site and help 
interact with release factors that bind to known as SUP2 (and also as SUP35, SUFI 2 

them to induce release of the nascent polypep.de. The yeas prot* n w ^ g (gene prf C) 

35 or GST1) and the human homolog as GST ^^^^ (see <PDOC00607>) and enhance their activrty 
3 is a Cass-ll RF. a GTP-bind,ng protein that interacts I RFs ; jn Caenorhabd itis e.egans 

[4] - Prokaryotic GTP-binding prote.n lepA and its homolog n yeasi jg ; arto EF . 1a|pha . 

ZK1236.1) Yeast HBS1 [5]. - Rat statin 81 [6 . » P* ejn o ^ Ef ? Tu Y for the insertion of se- 

Prokaryotic -lenc^steine-specKic^^ te P tMAetQ [8 , 9] from vari ous bacteria 

40 lenocysteine directed by the UGA codon. - The ^^ s ^ ndU lasmaur ealyticum. Tetracycline 
such asCampylobacter jejuni, Ente ™f C f a °^^ These pr0,einS *** * e 

binds to the prokaryotic ribosomal 908 subunrt and -^^J^n protein nodQ [10]. - Escherichia coli hy- 
inhibitory effect of tetracycline on protein l ^^^^^ aia J m to be involved in a conformational 
pothetical protein yihK [11].ln ^^^^^^^ * ** EF-lalpha/EF-Tu as we,, as EF- 

45 r G e aX:S 

RNQ]- 

[ 6] Ann D.K., Moutsatsos I.K., Nakamura T, Lin H.H., Mao P. L, Lee M. 
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Chem. 266:10429-10437(1991). ,„„ nom 
[ 7] Forchammer K.. Leinfeldr W., Bock A. Nature 342:453-456 1989). 

[ 8] Manavathu E.K., Hiratsuka K., Taylor D.E. Gene e^ 17 " 26 ^ 98 ^, . . ^.o^g 3626(1988) 
Q Leblanc D J Lee L.N., Titmas B.M., Smith C.J., Tenover F.C. J. Bactenol. 170 .3618-3626(1988) 
SjSS^E. Sharma S.B., MaiHet F., Vasse J., Truchet G. Rosenberg C. ^SSw 
11 Plunkett G III, Burland V.D., Daniels D.L., Blattner F.R. Nucle«c Ac.ds Res. 21.3391 -3398(1993). 

[12] Moller W., Schipper A., Amons R. Biochimie 69:983-989(1987). 

r06811 228 GTPcyclohydrolasell. . . 

rnfifln ?pq Galactose-1 -phosphate uridyl transferase signatures (GalPJJDPJransf) 

pauems to, both families we,, developed Fo, dase-l enzymes the signature is based on In. active sua residues. 

SSTSSZ. ^^Itlsm^iU^^^-O- No,.: cfcas-t enzymes a,a 
structurally related to the HIT family of proteins (see <PDOC00694 

[ 11 Reichardt J.K.V., Berg P. Nucleic Acids Res. 16:9017-9026(1988). 
[ 2] Mollet B., Pilloud N. J. Bacteriol. 173:4464-4473(1991). 

[0685] 230. Gamma-thionins family signature 

[0686] The following small plant proteins are evolutionaiy related: 

- Gamma-thioninsfromwhea.endos P erm(gamma-purothionin S )andbarley(gamma-hordothionins)whicharetoxic 
to animal cells and inhibit protein synthesis in cell free systems [1]. 

thaliana [3]. 

- Inhibitors of insect alpha-amylases from sorghum [4]. 

- Probable protease inhibitor P322 from potato. 

terminus and a proline-rich C- terminal domain. 
Soybean sulfur-rich protein SE60 [7]. 

- Vicia faba antibacterial peptides fabatin-1 and -2. 

ro6871 in their mature form, these proteins generally consist of about 45 to 50amino-acid residues. As shown in the 
Sng schematic representation, these peptides contain eight conserved cysteines mvolved ,n d.ulfde bonds. 



xCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC 



'C: conserved cysteine involved in a disulfide bond. 
'*': position of the pattern. 

[0688] Consensus pattern: l KRG]-x-C-x(3)-[SV]-x(2)-[FYWH]-x-[GF]-x-C-x(5)-C-x(3)-C [The four C's are invoked ir 
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disulfide bonds]- 

m Brubc M., Jimenez MA. Santoro J., Gonzaiez C, ColiHa F.J., Mendez E.. Rico M. Biochemist* 32:715-724 

Sq , Kawata E.E., Morse M.-J., Wu H.-M., Cheung A.Y. Mol. Gen. Genet. w f 

[3] Terras F.R.G., Torrekens S.. van Leuven F„ Osborn R.W.. Vanderleyden J., Cammue B.P.A., Broekaert W.F. 
FEBS Lett. 316:233-240(1993). 

r 4 i Bloch C. Jr., Richardson M. FEBS Lett. 279:101-104(1991). „_ nnm 
5 Ishibashi N., Yamauchi D., Miniamikawa T. Plant Mol. Biol. 15:59-64(1990). 
[7] Choi Y, Choi YD., Lee J.S. Plant Physiol. 101:699-700(1993). 

SSHss==ss3sa=ssaass 

[0692] Consensus pattern: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]- 
[0693] 11] Lane B.G. FASEB J. 8:294-301(1994). 
[0694] 233. (GlutR) 

Srved eg" is EZ 2 positions 99 to 1 22 in the sequence of known G.uTR. This region seems .mportant for 

[0700] 234. (Glycoprotease) 

is GCP is highly similar to the following uncharacterized proteins: 

- Escherichia coli hypothetical protein ygjD (ORF-X). 

- Bacillus subtilis hypothetical protein ydiE. 

. Mycobacterium leprae hypothetical protein U229E. 

so . Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6803 hypothetical protein si r0807 . 

- Methanococcus jannaschii hypothetical protein MJ11 30. 

- Haloarcula marismortui hypothetical protein in HSH 3' region. 

- Yeast hypothetical protein YKR038c. 
55 - Yeast hypothetical protein QRI7. 

[0702] One o, the conserved regions contains two conserved histidines. It is possible that this region is invorved in 
coordinating a metal ion such as zinc. 
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10703] Consensus pattern[KRHGSAT]-x(4)-[FYWLH]-[DQNGK]-x-P-x-[LIVMFY]-x(3)-H-x(2)-lAG]-H-[LIVM] Se- 
quences known to belong to this class detected by the pattern ALL. 
[oZ] S these proteins belong to family M22 in the classification of peptidases [2,E1]. 

[ 11 Abdullah K M., Lo R.Y.C., Mellors A. J. Bacterid. 173:5597-5603(1991). 
[ 2] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[0705] 235. (Glucosamine_iso) 

[ 1] Oliva G., Fontes M.R.M., Garratt R.C., Altamirano MM, Calcagno M.L., Horjales E. Structure 3:1323-1332 
l^Reizer J., Ramseier T.M., Reizer A, Charbit A, Saier M.H. Jr. Microbiology 142:231-250(1996). 
[0707] 236. Pneumovirus attachment glycoprotein G (glycoprotein G) 

0708 This family includes attachment proteins from respiratory synctial virus. Glycoprotein G has MbHr^Mi 
thases may be distant members of this family. 

abequose, to a range of substrates including cellulose, dolichol phosphate and teichoic acids. 
[0714] 239. (Glucos_transf_3) 

. ^£3E£L~~~l* P-Pho*^ (EC 2 .4. 2 . 2 , (9S n e pdp, ,3, is an .nz y ™ M -d 

;o class detected by the pattern ALL. 

[ 1] Walter M.R., Cook W.J., Cole LB., Short S.A., Koszalka G.W., Krenitsky T.A., Ealick S.E. J. Biol. Chem. 265: 

55 jsfs^xild H.H., Andersen L.N., Hammer K. J. Bacterid. 178:424-434(1996). 

[0720] 240. Glycos_transf_4. Glycosyl transferase. Number of members: 44. 
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[0721] [1] Medline: 95252686. A family of UDP-GlcNAc/MurNAc: polyisoprenol-P GlcNAc/MurNAc-1-P transferases. 

Lehrman MA; Glycobiology 1994;4:768-771 . 

[0722] 241 . Glycosyl hydrolases family 15. 21 members. 

F07231 242. Glycosyl hydrolases family 16 signature 

t has been shown [1] that the following grycosyl hydrolases can be classified into a angle family on he basis of 
sequence similarities: - Bacteria, beta-1 ,3-1,4-glucanases, or lichenases, (EC 3JL1_73) mainly from Bacillus but ato 
from Clostridium thermocellum (gene licB), Fibrobacter succinogenes and Rh ^° r ™^^^ 
cillus circulans beta-1 ,3-glucanase A1 (EC 3.2.1.39 ) (gene glcA). - Lamarmase (EC 32J.6) from Clostridium thermo- 
c L gene , a m1). - Streptomyces coe.ico.or agarase (EC SJJJI) (gene dag A). - 

kappa-carrageenase (EC 3.2.1.83 ) (gene cgkA).Two closely clustered conseived glutamates have been shown [2] to 

p724i 9n crse P nls rn pattem: E-[LIV]-D-[LIV]-x(0,1)-E-x(2HGQ]-[KRNF]-x-[PSTA] [The two Es are active site resi- 
dues]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M., Pons J., Dot T., Querol E., Planas A. J. Biol. Chem. 269:14530-14535(1994). 

r07251 243. Glycosyl hydrolases family 17 signature . 
t has been shown [1,2] that the following grycosyl hydrolases can be classified into a single fam,^ on the basis of 
equence similarities: - Glucan endo-1 ,3-beta-glucosidases (EC 3^39) (endo-(1 ->3)-beta- glucanase) from various 
plants This enzyme may be involved in the defense of plants against pathogens through its ability to degrade fungal 
^^rtrtdJ • Glucan 1,3-beta-glucosidase (EC ^SB) (exo-(1 ->3 ) -beta-g.ucanase from yeas (gene 
BGL2). This enzyme may play a role in cell expansion during growth, in cell-call fusion during mating, and ,n spore 
release during sporulation. - Lichenases (EC 3.2.1.73 ) (endo-(1->3,1->4)-beta-glucanase) from various plants. The 
best conserved region in the sequence of these enzymes is located in their central sect,on. This region contains a 
conserved tryptophan residue which could be involved in the interaction with the glucan substrates [2] and ,t also 
chains a consented glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism. th,s 

r 11 Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G„ Lotan T, Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 

[ 3] Varghese J.N., Garrett T.P.J., Colman P.M., Chen L, Hoj P.J., Fmcher G.B. Proc. Natl. Acad. Sc.. U.S.A. 91. 
2785-2789(1994). 

[0726] 244. Glyoxalase I signatures ,^ rm ^\ nn 
Glyoxalase I (EC 4AJJ5) (lactoylglutathione lyase) catalyzes the first step of the glyoxa pathway, he toM on 
of methylglyoxal and glutathioneinto S-lactoylglutathione which is then converted by glyoxalase II to tactic ac^ [1L 
Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc per subunit. The bacterial and yeas enzymes are 
monomeric while the mammalian one is homodimeric. The sequence of glyoxalase I is well conserved. In bacteria and 
mammals the enzyme is a protein of about 1 30 to 1 80 residues while in fungi it is about twice longer. In these organisms 
the enzyme is built out of the tandem repeat of an homologous domain. Two signature patterns for this famrty were 
derived The first one is located in the N-terminal region while the second one is located in the central section of the 
protein and contains a conserved histidine that could be implicated in the binding of the zinc 'atom. 
[0727] Consensus pattern: [HQHIVT]-x-[LIVFY]-x-^ 
Consensus pattern: G-[NTKQ]-x(0,5)-[GA]-^ 

[0728] [ 1] | Kim N.-S., Umezawa Y, Ohmura S., Kato S. J. Biol. Chem. 268:11217-11221(1993). 
[0729] 245. (Glypican) 

Orleans' [! 9 2] a are 8 a family of heparan sulfate proteoglycans which are anchored to cell membranes by a glycosyl- 
phosphatidylinositol (GPI) linkage. Structurally, these proteins consist of three separate domains: 

b) An^raSSomain of about 500 residues that contains 1 2 conserved cysteines probably involved in disulfide 
bonds and which also contains the sites of attachment of the heparan sulfate glycosaminoglycan side chains; 

c) A C-terminal hydrophobic region which is post-translationally removed after formation of the GPI-anchor. 
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[0730] The proteins known to belong to this family are: 

- Glypican 1 (GPC1). 

: %Z V%£> VSSStZ, *~ gpc 3 are * . » ,«* »*~ 

Galabi-Behmel syndrome (SGBS). 

- K-glypican. 

- Glypican 5 (GPC5). 

- Drosophila protein dally. 

[0731] The signature pattern that was developed for glypicans is located in the central section of the extracellular 
domain and contains five of the conserved cysteines. 
[0732] Consensus pattemC-x(2)-C.x-G^ 

Solved in a disulfide bonds] Sequences known to belong to this class detected by the pattern ALL, except for dally. 

f 11 Weksberg R, Squire J.A., Templeton D.M. Nat. Genet. 12:225-227(1996). 
[ 2] Watanabe K., Yamada H.. Yamaguchi Y. J. Cell Biol. 130:1207-1218(1995). 

Ss (c^omSTninToTlecretogranins) [1] are a family of acidic proteins present in the secretory granules of a 
wide variety of endocrine and neuroendocrine cells. The exact function(s) of these prote.ns is not yet ^known but they 
seem to be the precursors of biologically active peptides and/or they may act as helper prote.ns ,n the packag ng of 
peptide hormones and neuropeptides. Three members of this family of proteins show some sequence s-m.lant.es. - 
Chromogranin A (CGA) [2], CGA is a protein of about 420 residues; it is the precursor of the peptide pancreastatin 
which strongly inhibits glucose-induced insulin release from the pancreas. - Secretogranin 1 (chromogranin B). A sui- 
te e P rSn o about 600 residues. - Secretogranin 2 (chromogranin C). A sulfated protein of about 650 rescue. 
Apart Lr, their subcellular location and the abundance of acidic residues(Asp and Glu), these proteins do ^not share 
many structural similarities. Only one short region, located in the C-terminal section, ,s conserved ,n all ^ 
Chromogranins A and B share a region of high similarity in their N-terminal section; th,s region includes two cysteine 
residues involved in a disulfide bond 

r07341 Consensus pattern: [DE]-[SN]-L-[SAN]-x(2)-[DE]-x-E-L- 
Snsuspattern CH^ 
by a disulfide bond]- 

[ 1] Huttner W.B., Gerdes H.-H., Rosa P. Trends Biochem. Sci. 16:27-30(1991). 
[ 2] Simon J.-R, Aunis D. Biochem. J. 262:1-13(1989). 

f™oU 2 ^ with dnaJ ' the ATPaSe aCtiVlt ff y °' ^ ^p^ZnofTut 

to accelerate the release of ADP from dnaK thus allowing dnaK to recycle more efficiently. GrpE s a protein of about 

22 to 25 Kd In yeast, an evolutionary related mitochondrial protein(gene GRPE) has been shown [2] to assoc.ate with 

the mitochondrial hsp70protein and to thus play a role in the import of proteins from the cytoplasm. As a signature 

pattern, the most conserved region of grpE was selected. It is located in the WenTiinal « sect.on. 

[0736] Consensus pattern: [FL]-[DN]-[PHEA]-x(2)-[HM]-x-A-[LIVMTN]-x(16,20)-G-[FY]- x(3)-[DEG]-x(2)-[LIVM]- 

[RI]-x-[SA]-x-V-x-[IV]- 

f 11 Georgopoulos C Welch W. Annu. Rev. Cell Biol. 9:601-635(1993). 

[ 2] Bolliger L, Deloche O., Glick B.S., Georgopoulos C, Jenoe P., Kronidou N., Horst M., Monshima N., Schatz 
G. EMBO J. 13:1998-2006(1994). 

r07371 248 Guanylate kinase signature and profile i 
Guanylate kinase (EC 2TA8) (QK) [U catalyzes the ATP-dependent phosphory.ation of GMP .nto GDR It is 
for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia col.), lower eukaryotes (such as yeast 
and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino acids. GK has been shown [2,3,4] 
5 to be structurally similar to the following proteins: - Protein A57R (or SalG2R) from various strains of Vtecirna 

This protein is highly similar to GK, but contains a f rameshift mutation in the N-term,nal section and could therefore be 
inactive in that vims The following proteins are characterized by the presence in their sequence of one or more cop.es 
of the DHP domain, a SH3 domain (see <PDOC50002 > as well as a C-terminal GK-like domain, these protein are 
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„ .• , »n rmort m AG l J Ks (membrane-associated guanylate kinase homologs) [5]: - Drosophila lethal(1 )discs large- 
collect,vely to ^^ n ^™^S^in is associated with septate junctions in developing flies and defects 
n Zdlol a e c S£ S^L imaginal disks. - Mammalian tight junction protein Zo-1 . - A family 
,n the dlgl ge m ^™°*T™° s * em to jnteract w J h the cytoplasmic tail of NMDA receptor subunits. Th,s family 
of rT«mm a l, a n synaptic ^te^.s that " S AP97/DLG1 and SAP102. - Vertebrate 55 Kd erythro- 

£ 7^Z**ZT1m™v play a structural role in the induction of the vulva. - Rat protein CASK - Human 
ore DLG2 P Human pro ^n DLG3There is an ATP-binding site (P-.oop) in the N-terminal section of GK. This region 

"Sheresid^ 

Lion was selected that contains two arginine and a tyrosine which are involved in GMP-bmding 
pSS Consensus pattern: T. [ ST ] -R-x(2)-[KR]-x(2)- [D E]-x(2)-G-x(2)-Y-x- [ FY]- [ L,VMKl- 

[ 1] StehleT, Schulz G.E. J. Mol. Biol. 224:1127-1141(1992). 
[ 2] Bryant P.J., Woods D.F. p.pII 68-621-622(1992). 
r 31 Goebl M G Trends Biochem. Sci. 17:99-99(1992). 

[ 4] Zschocke P.D., Schiltz E„ Schulz G.E. Eur. J. Biochem. 213:253-269(1993). 
[ 5] Woods D.F., Bryant P.J. Mech. Dev. 44:85-89(1994). 

[0739] 249. (Glyco_hydro_35) 

Sequences known to belong to this class detected by the pattern ALL. 

MlTaronCH Benner J.S., Hornstra L.J., Guthrie E.P. Glycobiology 5:603-610(1995). 

[ 2] Carey A.T.: Holt K„ Picard S, Wilde R., Tucker G.A., Bird C.R.. Schuch W, Seymour G.B. Plant Physiol. 108. 
1099-1107(1995). 

[ 3] HenrissatB.,Bairoch A. Biochem. J. 293.781-788(1993). Sci U S A 92' 

[4] Henrissat B„ Callebaut I., Fabrega S., Lehn P., Mornon J,P, Davies G. Proc. Natl. Acad. Sc.. U.S.A. 92. 
7090-7094(1995). 

> [0744] 250. (Glyco_hydro_16) 
of sequence similarities: 

5 - Bacteria, beta-1,3-1,4-glucanases. or lichenases, (EC 3.2.1.73) mainly from Bacillus but also from Clostridium 
thermocellum (gene licB). Fibrobacter succinogenes and Rhodothermus marinus (gene bgIA). 

- Bacillus circulans beta-1 ,3-glucanase A1 (EC 3.2.1.39) (gene glcA). 

- Lamarinase (EC 3.2.1 .6) from Clostridium thermocellum (gene Iam1 ). 

- Streptomyces coelicolor agarase (EC 3.2.1 .81 ) (gene dagA). 

o - Alteromonas carrageenovora kappa-carrageenase (EC 3.2.1 .83) (gene cgkA). 

[0746] Two closely clustered conserved glutamates have been shown [2] to be invoked in the catalytic activity of 

[ ilHenrissatB. Biochem. J. 280:309-316(1991). HilM ,.. 00 .. 

[ 2] Juncosa M.,.Pons J„ Dot T, Querol E„ Planas A. J. Biol. Chem. 269:14530-14535(1994). 
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[0748] 251 . (GlycoJiydroJ 7) 
Glycosyl hydrolases family 17 signature 

basis of sequence similarities: 

. SlalT 5( EC3.2,73,,.ndo. ( v>3>,4) W l»oa MS .,.ro m va*usp«, 

, E.| ' Uusncss known >o b el on a <o «. class dmacteO b» tha panam ALL 

[ 1] HenrMat B. Biochem. J 280:309-316(1991V 9.3429.3436(1990). 

5 2785-2789(1994). 

[0752] 252. (Glyco_hydro_3) 

10 classified into a single family: 

. Bs.a^as.EC.,,,,,,^^ 

Bins, saccharamycopals flbuligata ^"^3'™ Bulyrivlbrio «brisol»sns (bgIA), CUM- 

. Alteromonas strain 0-7 bala-hexosaminidasa A (EC 3.2. 1 .52). 

[ 3] Bause E„ Legler G. Biochim. B.ophys. Acta 626.459-465(1980). 

so [0756] 253. (Glyco_hydro_28) 

Polygalacturonase active site (aka P<3) se) » 2] cata | yzes the random hydrolysis of 1 ,4-alpha-D-ga- 

end, releasing digalacturonate. 
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S?] Note these proteins belong to family 28 in the classification of gfycosyl hydrolases [5]. 

[ 1] Ruttowski E.. Labrtzke R., Khanh N.Q., Loeffler F., Gottschalk M., Jany K,D. Biochim. Biophys. Acta 1087: 
104-106(1990). 

f 21 Huang J , Schell MA J. Bacterid. 172:3879-3887(1990). 

3] He S.Y., Collmer A. J. Bacterid. 172:4988-4995(1990). 

4 Bussink H.J.D., Buxton F.P., tfsser J. Curr. Genet. 19:467-474(1991). 
[ 5] Henrissat B. Biochem. J. 280:309-316(1991). 

[0762] 254. (Glyco_hydro_32) 
basis of sequence similarities: 

: 

- Snos'e invertasftEC 3.2.1.26) (gene rafD) from Escherichia coli P .asmid P RSD2. 

- Levanase (EC 3.2.1 .65) (gene sacC) from Bacillus subtilis. 

to this class detected by the patternALL. 

5 [^Reddy V.A., Maley F. J. Biol. Chem. 265:10817-10120(1990). 

[0766] 255. (Glyco_hydro_1) 

o classified into a single family: 

. Beta-glucosidases (EC 3.2.1 .21) from various bacteria such as Agrobacterium strain ATCC 21400, Bacillus poly- 
myxa, and Caldocellum saccharolyticum. 

;o santhemi (gene arbB). 

. Plants myrosinases (EC 3.2.3 1 ) (sinigrinase) ^ 

55 hydrolases. 
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acting as a nucleophi, This region was usedas 

^sest^ [E is the active site 

!esidue] Sequences known to belong ^^^^SEft domains, which are removed trom the 
l0 770) Note: this pattern w„, p,k up he « t£d«™£ sUe g utmate and may therefore be inactive [4], 

i^c— sequences 

known to belong to this class detected by the patt em i ALL 

[0772] Note: this pattern will pick up the last three doma.ns of LPH. 

r 1] Henrissat B. Biochem. J. 280:309-316(1991) 

5887-5889(1990). 

[0773] 256. Glyco_hydro_20 

Glycosyl hydrolase family 20 

Previous Pfam IDs: glycosyljiydrt 1 ; 

Number of members: 33 

[0774] 257. (Glyco_hydro_9) 

Glycosyl hydrolases family 9 active sites signatures 

(akaGlycosyl_hydr12) n „Hwtan« raauires several types of enzymes such as endoglucanases 

. Butyrivibriofibrisolvenscellodextrinasel (cedl). 

- cellulomonas fimi endoglucanases B (cenB) and C (cenC). 
. Clostridium cellulolyticum endoglucanase G (celCCG). 

- Clostridium cellulovorans endoglucanase C (engC). 

; . Clostridium stercoararium endoglucanase Z (avicelase ) (ce Z). 

. Clostridium thermoce.lum endoglucanases D (celD), F (celF) and I (cell). 

- Fibrobacter succinogenes endoglucanase A (endA). 

- Pseudomonas fluorescens endoglucanase A (celA). 

- streptomyces reticuli endoglucanase 1 (cell). 

o - Thermomonospora fusca endoglucanase E-4 (celD). 

fs ripening process. 

so .spartata and a glalanBle. ^^^^S^^HPUVMHi-x-n [H isan a*, ait. residue, 
tomyces reticuli can d.»(4HFYW1-x(3)-E-xH8TA1-x(3)-N-[STA] [D and E are active site residues] Se- 

sequence seems to be incorrect. 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990). 
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w„ q r Ir Warren RAJ. Microbiol. Rev. 55:303-315(1991). 
!61Toninie P., van Beeumen J., Claeyssens M. Biochem. J 285:319-324(1992). 



c^Zwoi glutamate and ammonia to form fl^^^jSrtical subunrts. The activrty of GSI-type 
Class I enzymes (GSI) are specific to prokaryotes and enzyme is inactive. - Class II enzymes 

S» are found in eukaryotes and in bacteria belonging to the Rm l ™ a bun|ts plants haV e two or more 

SsTthe e b^na have also a class-l GS). G J " I" enzymes (GS..I) has, currently 

sozyme so GS.I. one of the isozymes is translocated "J*^J^ a hexame r o, identical chains. It is much 
onlv Cen found in Bacteroides fragilis and .n b ^™ lb ^ f ^ ac j^^ GS |I (350 to 420 amino acids) enzymes. 

is reversibly adenylated^ G . s . s . x{6 , 8) . [D ENQSTAKl-[SA]-[DE]-x(2)-[LIVMFY]- 
delta epsilon and zeta chains .n mammals, to example V A wjde varjety of globins are found in 
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and lunoioally «ttd » Mik**" and "Sj^CSSl. d non-la 9 uminou 5 plants [131 and Iron 

[07891 262. F,uc tOS a-bisphospha,a *^^^%%X£*P>> « '«<"»«"<> aldo1 cl8aVa " " 
0790 F™«0S.*i.phOBpha,» aldolase [,,21 » ^*^™^,. and g^araldahyda s-pnospMa.Th,,. 

arelwoelass^ofln-cloaa^isptopna'aldWa^^ schld-base Km** W* 8e " * e °; 2 „ 

,hisela,s ot enzyma. 1VMFW 1-E-G-X-1LS1-L-K.PHSN1 [K is txctad h SchiH-basa lormabonV 

5 [0791] Consensus pattern. [LI VM]-x iLivwir i vvj L 

[ 1] Perham B.N. Biochem. Soc. Trans. 18.185-187(1990^ 

1^4), cellobiohydrdases (EC 3*191.) ^gS^U^S^ «^ * SeqUenC6 ^"T^ 

spectrum of cellulolyt.c ^^^"^TZ cellulase family G [3] or as the glycosyl hydrolases 
35 be classified into families. One of these *^ m J^££* this tamity are listed below. - Aspergillus awamor. 
family 1 1 [4,E1]. The enzymes wh,ch are current^ an(j subtilis xytenase (xynA). - Clostr,d,um ace- 

xylanase C (xynC). - Bacillus circulans pum us, < ^^^'^ A) . Fibrobac ter succinogenes xylanase C 
JLtylicum xylanase (xynB). - ™^!TZ^SSZZ£ «ar£ ™. - Neoca.limastix patriciarum xylanase A 
(xyn C) which consist of two catalytic ^^J^^^ This protein consists of three domains: a N- 
4 o SnA . - Ruminococcus flavefaciens brfunct^al xylanase ) P^^ & ^ ^ 

e minal xylanase catalytic doma,n that ^"^^"^^Jdo.Ui that belongs to family 10 of glycosyl 

regions were used as signature patterns. rnPl-x-[FYWHN] IE is an active site residue]- 

SLr P a^^^ 

1 11 Baa* P. Annu. Hav. Mt-obioL " 2 ^ 1 ^> J( Warren RAJ. Mcfottt FW. 55:303.3,5(,99,). 

S3 i5S?» * ■ «- ^ v - urabe ■■• oteda K B,octem J - 2B8: 

55 117-121(1992). 

[0794] 264. Glycosyl hydrolase family 14 
[0795] This family are beta amylases. 
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malian lactase-phlormn hydrokse (LPH P M ejn Q< about , 900 res idues which contains four 

No^this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the LPH 
[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

5887-5889(1990). 

, CcZLpatta^pENO^^ 
residue]- 

[ 11 Henrissat B. Biochem. J. 280:309-316(1991). Mi „ m u if o i-»7-qRQ-980M991) 

P Schroeder C J Robert C, Lenzen G., McKay L.L., Mercenier A. J. Gen. Microbiol. 137.369 380(199!). 
wZTc AebeLld R., Withers S.G. J. Biol. Chem. 267:11126-11130(1992). 

into a single family: 

Turn tha-mocellurn (bglB), Escherichia coli (bglX), Erwinia chiysamham, (bgrt) and Hum.nKoccas albua. A. 
!£ teromonas strain 0-7 beta-hexosaminidase A (EC 3.2.1 52). 
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active site residue] 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 
2 Castle L A., Smith K.D., Morris R.O. J. Bacterid. 174:1478-1486(1992). 
[ 3] Suse E.. Legler G. Biochim. Biophys. Acta 626:459-465(1980). 

[0803] 268. Glycosyl hydrola ses family E ' s ^" re o{ enzymes such as endoglucanases (EC 

The microbial degradation of cellulose and xylans requires seve ra nype » d b terja 

3.2.1.4). cellobiohydrolases (EC 3^191)(exoglucanases), ^^^^^^^^ ^ w^ce similarities, can 
^Lpectr U mo»ce.Mo^icen = 

be classified into famil.es. One of these families is known as in _ Ace tobacter xylinum 

family 8 [4.E1], The enzymes which are currency ^ 

endonucleasecmcAX.-Bacillusst^^^ 

2 (celB). - Cellulomonas uda endoglucanase. ; Ctost /*'^^^ y (ce lY). - Bacillus circulans 

thermocellum endoglucanases A (celA). -^'^^ conserved region in these en- 
be ta-g,ucanase (EC - Eschencha ^SS^SZ aspartate The first aspartate is thought [5] to 

residue]- 

[ 3] Henrissat B., Claeyssens M., Tomme P., Lemesie L., women 
4 Henrissat B. Biochem. J. 280:309-316(1991). 
S 5] Alzari PM., Souchon H., Dominguez R. Structure 4:265-275(1996). 

[0804] 269. Glycosyl hydrolases types of enzymes such as endoglucanases (EC 

The microbial degradation of cellulose and xylans requires ^ W /e) [1,2]. Fungi and bacteria produce 

^.cdlobtohyd^ simil arities, can be 

is alplctrum of cellulolytic enzymes (cellulases) and T*' ' " ™ .« or as the 4 glyC osyl hydrolases family 

clas'smed into families. One of these families « 

9 [4.E1]. The enzymes which are currently known to belong to *■ . clostridium cellulolyticum 

ce loSextrinase 1 (cedl). - Cellulomonas f,m, • i***" 0 ^*^ f e ° nq c) Clostridium stercoararium endog- 
endog.ucanase G (celCCG). - Clostridium ^^^^^^^^ (c elF) and I (cell). - Fibrobacter 

• .ucana Se Z(avice.ase.)(ce,Z).« 

succinogenes endoglucanase A (endA). Pse f u °°^° n ^, u ° canase E . 4 (ce , D) . Di ctyostelium discoideum spore ger- 
endoglucanase 1 (cell ). - ^^^^^^^1 digest the spoTe cell wall during germination, to 
mination specific endoglucanase 270-6. This ^^J^^ J iado or French bean. In plants this enzyme 
release me endowed an*>eba.-^ jn these enzymes are centered on 

45 may be involved the frurt ripening P"««JJ°* J^SJSLi D from Cellulomonas thermocellum, to be 
conserved residues ^ich ^ 

important for the catalytic activity. The tirst region con a signature patterns. 

1 3] Henrissat B., Claeyssens M., Tomme P., Lemesie L, Mornon J,P. Gene 81.83 95(1989). 
55 - b e rt Claeyssens M. , Bio, Chem. 266:10313-10318 

[ 6] Tomme P., van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
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[0 806] ZTO.GIycerakJe^ 

Glyceraldehyde 3-phosphate dehydrogenase (EC 12JJ| ) (GA W m mo|ecu , e „ inv0 „ ed in forming a 

to both the glycolytic and gluconeogenic PfT^J^^thtocyrtlna is totally conserved in eubacterial 
covalent phosphoglycerol thioester ^"^^^^^^J^ highly dK/ergen. archaebacterial 

ISfSsaSSSsSsss- 

[0808] 271 . Granulins signature biological activity. A precursor 

Granulinsn]areafa mi .yofcyste^ 

below: ^r.^vmrfK **** C: conserved cysteine probably 

that resembles other viral RNA polymerases HCV ^l**™ L resulting minus strand is used as a template for the 

EMBO J 1996;15:12-22. [3] Ishido S, Fujita T, Hotta H; 
.5 Biochem Biophys Res Commun 1 998; 244: 35-40. 

[0815] 274. HIT family signature d escribed[1]. This family currently consists of: - 

Recently a family of small proteins of about 2 Ac 16 K I ha. b ^ d ^ c ,. 1 J H|NT was incorrectly thought to be 

i0 Mammalian protein HINT (also known as Protein tanas C inhib to M or hk J , P 4-tetraphosphate 
a specific inhibitor of PKC. It has been shown to bnd ^^^^JSS.pppp. 5* to yield AMP and ATP. 
asymmetrical hydrolase (Ap4Aase) (EC 3Jx1J2) gets [3] as a diadenosine 5',5"-P1 ,P3-tri- 

- FHIT, a human protein whose gene is alt ered " 1 t ^. tU ™ AMP and ADP. - Yeast proteins HNT1 

phosphate hydrolase (ApSAase) ^^^^^^ protein ycfF. - Haemophilus influenzae 

45 and HNT2. - Maize z,nc-b,nd.ng protein Z BP1 ^ "chencma co y P _ MethanococC us jannaschii hypothet- 

hypothetica. protein HI096T-Hel^ 
ical protein MJ0866. - Mycobacterium leprae ^ P h °^ 
protein slr1234.-Caenorhabd^ 

in Azospirillum brasilense. - A hypothet.cal iai IK P"* m in P 37 9 J g ^ wjth , hree 

so 12.4 Kd protein in psbAII 5Teg,on ,n ^*""»" S na S this family: HIT, for 'HlstidineTriad [1]. This region 
clustered histidines. This region is responsible for the des gnaUon ^unus y alpha-phosphate 
was originally thought to be impHed ir 

[1]SeraphinB.DNASeq. 3:177-179(1992) o 32f1 995) 
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Huebner K. Biochemistry 35:11529-11535(1996). 

[ 4] Brenner C, Garrison P., Gilmour J., Peisach D., Ringe D., Petsko G.A., Lowenstein J.M. Nat. Struct. Biol. 4: 
231-238(1997). 

r0817l 275 Myc-type, 'helix-loop-helix 1 dimerization domain signature (HLH) 

A number of eukaryotic proteins, which probably are sequence specific DNA-binding proteins that act as transcription 
factors share a conserved domain of 40 to 50 amino acid residues. It has been proposed [1 ] that this domain is formed 
of two amphipathic helices joined by a variable length linker region that could form a loop. This 'helix-loop-helix (HLH) 
domain mediates protein dimerization and has been found in the proteins listed below [2,3,E1,E2]. Most of these pro- 
teins Nave an extra basic region of about 15 amino acid residues that is adjacent to the HLH domain and specifically 
binds to DNA. They are refered as basic helix-loop-helix proteins (bHLH), and are classified in two groups, class A 
(ubiquitous) and class B (tissue-specific). Members of the bH LH family bind variations on the core sequence CANNTG 
also referred to as the E-box motif. The homo- or heterodimerization mediated by the HLH domain is independent of, 
but necessary for DNA binding, as two basic regions are required for DNA binding activity. The HLH proteins lacking 
the basic domain (Emc, Id) function as negative regulators since they form heterodimers, but fail to bind DNA. The 
hairy-related proteins (hairy, E(spl), deadpan) also repress transcription although they can bind DNA The proteins of 
this subfamily act together with co-repressor proteins, like groucho, through their C-terminal motif WRPW. - The myc 
family of cellular oncogenes [4], which is currently known to contain four members: c-myc [E3], N-myc, L-myc. and B- 
myc The myc genes are thought to play a role in cellular differentiation and proliferation. - Proteins involved in myo- 
qenesis (the induction of muscle cells). In mammals MyoD1 (Myf-3), myogenin (Myf-4), Myf-5, and Myf-6 (Mrf4 or 
herculin) in birds CMD1 (QMF-1 ), in Xenopus MyoD and MF25, in Caenorhabditis elegans CeMyoD, and in Drosophila 
nautilus (nau) - Vertebrate proteins that bind specific DNA sequences ('E boxes') in various immunoglobulin chains 
enhancers- E2A or ITF-1 (E1 2/pan-2 and E47/pan-1 ), ITF-2 (tcf4), TFE3, and TFEB. - Vertebrate neurogenic diff eren- 
tiation factor 1 that acts as differentiation factor during neurogenesis. - Vertebrate MAX protein, a transcription regulator 
that forms a sequence- specific DNA-binding protein complex with myc or mad. - Vertebrate Max Interacting Protein 
1 (MXI1 protein) which acts as a transcriptional repressor and may antagonize myc transcriptional activity by competing 
for max - Proteins of the bHLH/PAS superfamily which are transcriptional activators. In mammals, AH receptor nuclear 
translator (ARNT), single-minded homologs (SIM1 and SIM2), hypoxia-inducible factor 1 alpha (HIF1A), AH receptor 
(AHR) neuronal pas domain proteins (NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1 ), mouse ARNT2, 
and human BMAL1 . In drosophila, single-minded (SIM), AH receptor nuclear translator (ARNT), trachealess protein 
(TRH) and similar protein (SIMA). - Mammalian transcription factors HES, which repress transcription by acting on 
two types of DNA sequences, the E box and the N box. - Mammalian MAD protein (max dimerizer) which acts as 
transcriptional repressor and may antagonize myc transcriptional activity by competing for max. - Mammalian Upstream 
Stimulatory Factor 1 and 2 (USF1 and USF2), which bind to a symmetrical DNA sequence that is found in a variety of 
viral and cellular promoters. - Human lyl-1 protein; which is involved, by chromosomal translocation, in T- cell leukemia. 
- Human transcription factor AP-4. - Mouse helrx-loop-helix proteins MATH-1 and MATH-2 which activate E box-de- 
pendent transcription in collaboration with E47. - Mammalian stem cell protein (SCL) (also known as tall), a protein 
which may play an important role in hemopoietic differentiation. SCL is involved, by chromosomal translocation, in 
stem-cell leukemia. - Mammalian proteins Id1 to Id4 [5]. Id (inhibitor of DNA binding) proteins lack a basic DNA-binding 
domain but are able to form heterodimers with other HLH proteins, thereby inhibiting binding to DNA. - Drosophila 
extra-macrochaetae (emc) protein, which participates in sensory organ patterning by antagonizing the neurogenic 
activity of the achaete- scute complex. Emc is the homolog of mammalian Id proteins. - Human Sterol Regulatory 
Element Binding Protein 1 (SREBP-1), a transcriptional activator that binds to the sterol regulatory element 1 (SRE- 
1) found in the flanking region of the LDLR gene and in other genes. - Drosophila achaete-scute (AS-C) complex 
proteins T3 (I'sc) T4 (scute), T5 (achaete) and T8 (asense). The AS-C proteins are involved in the determination of 
the neuronal precursors in the peripheral nervous system and the central nervous system. - Mammalian homologs of 
achaete-scute proteins, the MASH-1 and MASH-2 proteins. - Drosophila atonal protein (ato) which is involved in neu- 
rogenesis - Drosophila daughterless (da) protein, which is essential for neurogenesis and sex<Jeterm.nat.on. - Dro- 
sophila deadpan (dpn), a hairy-like protein involved in the functional differentiation of neurons. - Drosophila del.lah 
(dei) protein, which is plays an important role in the differentiation of epidermal cells into muscle. - Drosophila hairy 
(h) protein a transcriptional repressor which regulates the embryonic segmentation and adult bristle patterning. - Dro- 
sophila enhancer of split proteins E(spl), that are hairy-like proteins active during neurogenesis, also act as transcrip- 
tional repressors - Drosophila twist (twi) protein, which is involved in the establishment of germ layers in embryos. - 
Maize anthocyanin regulatory proteins R-S and LC. - Yeast centromere-binding protein 1 (CPF1 or CBF1 ). This protein 
is involved in chromosomal segregation. It binds to a highly conserved DNA sequence, found in centromers and in 
several promoters. - Yeast IN02 and IN04 proteins. - Yeast phosphate system positive regulatory protein PH04 which 
interacts with the upstream activating sequence of several acid phosphatase genes. - Yeast senne-nch protein TYE7 
that is required for ty-mediated ADH2 expression. - Neurospora crassa nuc-1 , a protein that activates the transcription 
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of s.,u=.ur>l 9 ene S .or phosphors 

STACKR]-x-[VMFYH]-[LIVMTA]-{P}-{P}-[UVMRKHQ].- 

[ 1] Murre C. McCaw P.S., Baltimore D. Cell 56:777-783(1989), 
[ 21 Garrel J , Campuzano S. BioEssays 13:493-498(1991). 
f 31 Kato G J Dang C.V. FASEB J. 6:3065-3072(1992). 

soma^rotein H6 (histone T) also belongs to this family. As a signature pattern a conserved stretch of 10 residues 
located in the N-terminal section of HMG14and HMG17 was selected. 
[0820] Consensus pattern: R-R-S-A-R-L-S-A-[RK]-P- inn/iQQft\ 
0821] [ 1] Bustin M., Reeves R. Prog. Nucleic Acid Res. Mol. B.ol. 54.35- 00(1996). 

[0822] 277. Hydroxymethylgluta^l^enzyme 'J^ff^^^ 4 1 3 4)catalyzes the transformation of 

^££ZX£^ *» *™ in mvaB ' to be required for the activity of the enzyme reg, ° 

around this residue is perfectly conserved and is used as a signature pattern. 
[0823] Consensus pattern: S-V-A-G-L-G-G-C-P-Y [C is the active site residue]- 

[ 1] Mitchell G A., Robert M.-R, Hruz P.W., Wang 8.. Fontaine G., Behnke C.E., Mende-Mueller L.M., Schappert 
K Lee c! 3ibson KM, Miziorko KM. J. Biol. Chem. 268:4376-4381(199 fl) 
[ 2] Hruz P.W., Narasimhan C, Miziorko H.M. Biochemistry 31:6842-6847(1992). 

; rnfiKdi Alnha-isoDroDvlmalate and homocitrate synthases signatures (HMGL2) ,_ lotQ 

ZcTteins two conserved histidine residues which could be implicated in the catalyt,c mechamsm 
[082S] Consensus pattern: L-R-[DE]-G-x-Q-x(10)-K- 
5 Consensus pattern: [ LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-[GAS]-x-[GASLI]- 3046M991) 
108261 r 11 WanaS-Z Dean D.R , Chen J.-S.. Johnson J.L J. Bactenol. 173.3041-3046(1991). 
mm 278 mo COA synt) Hydroxymethylg.utaryl-coenzyme A synthase actrve site 

site residue as a signature Pattern- r,vi. D . x(2 )-N-A-C-[FY]-x-G [C is the active site residue]- 

55 [0828] Consensus pattern: N-x-[DN]-[IV]-E-G-[IVJ-DX^) in Moirrj AO i HprmpOD Arch 

[0829] 1 1] Rokosz L.L., Boulton D.A., Butkiewicz E.A., Sanyal G., Cueto MA. Lachance P.A., Hermes J.D. Arch. 
Biochem. Biophys. 312:1-13(1994). 
[0830] 279. HMG (high mobility group) box 
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. . . -I I, n ^mntor olAtTIPntS (h , 

HSF 



[0833] Consensus pattern. L-x(3) ii-tj t\ n 

^SrH^^^^^rS m m* BIO, -03-613,1994,. 

ro8341 281 Heat shock hs P 20 proteins family profile envir onmental stress by inducing the synthesis 

the hs P 70 family of proteins were derived, tne 
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[0838, C.-KerauM*^ 

I ,] lindoaist S., Craig EA Anna Rev. Ganet. 22:631 -677(1988). 

[ 8] Craig E.A., Gross C.A. Trends Biochem. Sc.. 16.135-140(1991). 

[0839] 283. Heat shock hsp90 proteins .family ^ure environmenta , sU ess by the induction of the 

Prokaryotic and eukaryotic organisms respond to I ^at shock o otn ^ & ^ ^ ^ 

synthesis of proteins collectrvely known as haaMiock p [1 9 be|Q tQ ^ {ami|y are; . 

an average molecular weight of 90 Kd, known as the gfJJJjT 0 . vertebrate hsp 90-alpha (hsp 86) and hsp 
Escherichia coli and other bacteria ^^^^^S^9 85. - Plants Hs P 82 or Hsp83. - Yeast and 
90-beta (hsp 84). - Drc*oph,la tap ,82 JjJLJSSZJS* 'endoplasmic (also known as Erp99 in mouse, 
other fungi HSC82, and HSP82 - The endopl^ 

; part of these proteins was selected. ,,„__...._, R tFn1 
[0840] Consensus pattern: Y-x-[NQH]-K-[DE]-[IVA]-F-L-R-[ED] 

0 j 3] Jakob U, Buahnar J. Trends Biachem. Sc. 19:205-211(1994). 

[0843] 285. Heme oxygenase signature „ ma , on7Vme that in animals, carries out the oxidation of heme, 

,s Heme oxygenase (EC 1.14.99.3) (HO) [1 .s ^^^^6 c^Z monoxide. Biliverdin is subsequently 
it cleaves the heme ring at the alpha methene br^ 

converted to bilirubin by biliverdin reductase. ^^J^^SSm HO-1 is highly inducible by its substrate 
3. The first two isozymes differ in the.r t.ssue exp ^ « has been suggested [2] that HO-2 could 

heme and by various non-heme substances, wh.l , HO-2 is non .nducwa g ne(j|0lran8mMerln the 

,o be implicated in the production ofca,^ 

genome of the chloroplast of red algae as well as in cyanoba terta, ^ heme oxygenase |S als0 

[ 1] Mainas M.D. FASEB J. 2:2557-2568(1988). 

[^at£arfp" 9 N W arSsc,.U,A, 4 :,,73e-,,7 4 ,«,997) 
[4] Schmrtt MP. J- Bacterid. 179:838-845(1997). 
[0845] 286. Hepatitis core antigen _ ccocce<; _ rarboxv | terminus rich in arginine. On this basis it was 
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[08511 288. Histone deacetylase family Regulation of transcription is caused in part 

r08541 289. Histidinol dehydrogenase signature tormina , steD in the biosynthesis of histidine in bacteria, 

Sno, dehydrogenase (EC 1JTJ*) J^^J^^T^ HDH is a single chain polypeptide; 
fungi, and plants, the four-electron oxidation of L J«^noltc > nwm djf(erent steps Q , h , st|d , ne b 

: in fungi it is the C-terminal doma.n ^^^S^^ *** is «P ort8d t0,h6 ^T'Z 
synthesis; and in plants it is expressed as centr P al part 0 , HDH was selected. This region does not 

As a signature pattern a highly conserved region MocrtJ "tjj « J contajns a cysteine residue which, ,n 

correspond to the part of the enzyme that, .n most, but no al hu q 
Sa,moL.,atyphimur,um, has been said J2^ 

o [0855] Consensus pattern: l-D-x(Z)-A-ia r r i j i 
[LIVMFC]-[LIVM]-[SA]-x(2)-E-H- 

[1]Na9 ai A .,Ward,,BecU.,Tad a ,,Chang,-,,Scheidegger A .,Rya,s,^ 

[0856] 290. Homoserine ^^^Jgj^ catafyzes NA D-dependent reduction of aspartate beta-sernial- 
Homoserine dehydrogenase (EC (HDh ) [1 ^1 c ^ from a rtate t0 ho moser,ne. The latter 

dehyde into homoserine. This reaction ^^^^^SSt as in that of methionine. HDh is found either 
so participates in the biosynthesis of threonine andj en is ^ ^ ^ enzyme consisti ng of an N4erminal as- 

[ „ Thomas D., Barbey R.. Surdin-Kerjan Y. ™^^ 1Be * 
\ 2] Cami B., Clepet C, Patte J,C. B.ochimie 75.487-495(1993). 

40 r0858] 291 . haloacid dehalogenase-like hydrolase hvdro | ase family fabhvdrolase). This family includes 

0859 This family is structurally different rom the H a, e h f s ^ consists of two do- 

L^-haloacid dehalogenase, epoxide hydrolases l ^^^^ on 0 f the alignment, between residues 
main,Oneisaninse rt edfou^ 

16 and 96 of Swi S s:P24069.. The rest of the fold is C ^P~~71 .20322-20330 
45 t. Liu JQ, KuriharaTEsakiN, Soda K, ^^riloendenl hdicasesstanatu res (helicase_C) 

[0860] 292. DEAD and DEAH box families ATP-dependent he ^ l ^ ^ ^ 

dumber of eukaryotic and 9^*£££Z M ^ 5 

larity. They all seem to be involved in «M^™*^ jn js a subuni , of a high molecular weight 

this family are: - Initiation factor « lF ^ A " f ^'^^^rna lo ribosomes. It is an ATP-dependent RNA-helicase. 
so complex involved in 5'cap recognition and 

PRP5 and PRP28. These yeast proteins are involved in various Air req a n tatjve RNA helicase, 

■. P pn0 5 rmouse protein ^33^^ Sedin pre-mRNA splicing and 

closely related to PI10. - SPP81/DED1 and DBP1. two yeast prote^ p y ^ mitochonaria | splicing, 

related to PI10. - Caenorhabditis elegans helicase glh-1 . MSSm a y P antjgen p68 has 

ss - SPB4, a yeast protein involved in the mature ton ^^^J, d S vision . - Rm62 (p62). a Drosophila 
ATPaseand DNA-helicase activities ,n ,vja "'^^^ 
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. ■ x*,«« a Drosophila protein important for oocyte formation and 
ROK1 a yeast protein. - stel3. a fission yeast protein. ^J^J^ expressed protein of unknown func- 
soecificaJon of embryonic posterior structures. - Me3lB, a ' ° ^n' coli puta tive RNA helicase wh.ch can 

ion db P A, an Escherichia coli putative RNA hehcase deaD an Esc ^ ^ ^ 

uppress a mutation in the rpsB gene for rib ^ 0 ™'^ 

h.EanEscheri^ prote.ns T26G101 

Zmv It probably interacts with 23S ribosomal RNA_Cae noma tjca| tein YH ri69w. - F,ss.on yeas 

r„. , , ""thh zz. 

ILL Imlel a Drosophila protein required in males, for a ™W e J JV damaqed by UV light, bulky adducts or 
RAD3 from leS RAD3 is a DNA helicase involved in "^^^E^, p Jein XPD (ERCC-2) are the 

entry <PDOC0Q017 

[4] Hodgman T.C. Nature 333.^ i (1991) 

binding domain called cytochrome b2. Qf njUate assjmi , a tion in plants, ung and 

This family of proteins also includes: 

. TU-36B, a Drosophila muscle protein of unknowr .function 16]. 
. Fission yeast hypothetical prote.n SdAC1F12.10c. 
ss - Yeast hypothetical protein YMR073C. 
- Yeast hypothetical protein YMR272C. 
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|, l Ozo BJ ^iochmBi^Acla9 9 7 a ,2,-130(,989). 

'citaoh.M.' m. Gon. f ""^Sf SSrW P.OO. Natl. Acd. Bel U.S.A. 86:5006.50.0(1988). 
[6] Levin R.J., Boychuk P.L. Cron.ger CM., Kazzaz J.A., Roze 
rnaeei 294 Hexapeptide-repeat containing-transferases signature b£j| t0 a sjng | e family. 

the outer membrane of the cell. - UDP ^^ t ^^ ,Jd A - Chloramphenicol acetyttransf erase (CAT) (EC 

, I xD or «rA), wh.h is also ^cfcn^ «* p,aSmid ,nCF " IT ■ ? ZZ CAT* 

2 3. 1 .28) from Agrobactenum tumefac e ns, BaciHus spnae . evo|utjonary re , a ted to the ma.n family o CAT 

a^osa, Staphylococcus aureus pfasmid P«^- ™£ CAT a s{erase jnvolved in the O-acetylat™ 

(see <PDOC00093». - Rhizobium nodulation protein , nod L N* i. » a ^ tetrahydro dipicor,nate N-succmyl- 

of So l i factors. ■ B arterial maltose O-acetyltransferase (EC ^17J ) w bjosyntneS is of diaminopimelate and 

[MctT Consensus pa.snn: JW-W**^*^ ,UV,- 
35 [GAED]-x(2)-[STAV]-x-[LTV]- x(3)-[LIV]- 

[ Downie J.A. MoL Microbiol. »^f^^ W 
2 Parent R Roy PH. J. Bacteriol. 174.2891-2897(19^). 

u „■ o(Fr 07 11H1 2] is an important glycolytic enzyme that catalyzes 
[08 68] 295. Hexokinases signature. ^ 

the phosphorylation of keto- and aldohexoses (e.g. 9 luOT ^ m ^ n "°f erred as types |,||, m and IV. Type IV hexokmase, 
4 s do^^ 

W hich is often incorrectty designated ^^J^J^^r,,,^ mass of about 50 Kd. Hexok.nases of types 
important role in modulating insulin secretin, ,t . a jwcton £ an ^ ^ stmcturally tney cons.stof a 

, to III. which have low Km values for B^^^^Ild by twohighly similar domains of 450 residues^ 
very smaHN-terminal hydrophobic membrane-bin^ 

087„ MlM^.tonR.J.acche^ 
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synthesized throughout the cell cycl* 

[0872] Consensus pattern: [AC]-G-L-x-r- r v 

[0873] Histone H4 signature (his2) d H3 ^ forms the eukaryotic n«cleosarne 

0874 Histone H4 is one of the four histones, along sequence of histone H4 has rema.ned almost 

which is implicated in DNA-binding ,[3] 
[0875] Consensus pattern: G-A-K-n-n- 

tea hiahlv conserved protein of 1 35 ammo ac.d residues! 2^ ■ ■ as g core hjstone neC essary for 

L c SinamS-like domain: - Mammalian centromeric protein CENP A [3]^ C bdjtjs elegans cnrom0 some 

hereof— e ,-Yeast^ 

, r0 m a conserved region in the central section of H3. 
Sns^= 5 ^— ipEOH ■» 

5 , „ Wells D.E., Brown D. Nucleic Acids ^^^^ 

2 Thatcher T.H., Gorovsky MA ^^^^^bb*,. 

,o [0878] Histone H2B signature (his4) a ir>no with H2A H3 and H4, which forms the eukaryotic nucleosome 

45 E-G- 

^Intato proteins. « ha, she. been ^^SC^Suaun,. 8on " ^ '"°" !inS "nS 
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HHWHHH H » HH ~^ mi " 

pnces thal was developed is 24 residues long and spans 

r08821 Consensus pattern. I"*' J l ^ 
[ 5] Schofield P.N. Trends Neurosci. 10.3-6(1987). 

DB.Caenorhabditis elegans ^jT^^^Bd hexapeptide was used. 
for this subfamily of ^^^^.p-w^KKSTA]- 
[0884] Consensus pattern. [LIVMrtj L~ 



I „ McGinnis W„ Krumlauf R. ^283^195^ 



r08871 Consensus pattern: L-M-A-[EQ]-G L 



,„ mm, — Ana,. ««» 



[ 2] Atomi 
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[0693] 300. Initiation factor 3 signature (he wteliOT a prolei „ biosynthesis in 

Ltion factor 3 (IF-3) (gene WO i^^^ZZ^^"^^'^'*^ , 
bacteria -IF^»««^»^ OT " ato ^'f^^«I«Mr RNA IF-3 binds to the 30S ribosomal subunit, it 
!rS. <ms ribosomal subunit, the initiator tRNA and the messenger m*. enhances the poly 

AUG)-depend.nt binding ^th«M^'« NA,0 ^"^ oWsw er«:eotMot.riallF-3|2].*5a S ignature 
rJt«.. S idu,s*o S ecen,,a,sec, l= oM S a^ 

SLus pattern: »<rtB3-*Wf<r™«W3*- ^ c B , Mol . Blol . 203:585.606(1988). 

,s a single cha* enzyme. In others • "*»^a«rt»c* « ^ ^ step „ ^ Blosy „ m e- 

that alsocatalyzes N-(5'-phosphonbosyl)anthran "^° me ^ ' aKo contai „ s a PRAI C-terminal domain and a 
sis. in tungi. IGPS is the central domain ota nWd«~ „, |GPS conlains a highly conserved -ag»n 

signature pattern for IGPS. x . E . (L IVMFYC]-K-[KRSP]-[STAKl-S-P-[ST]-x(3)-[LIVMFYST]- 
E [1] to be evolutional related: - D W 

the fourth step in the biosynthes.s of .soleucineandyahne^he ae y ^ ^ ^ step the 

ketoisovaleric acid - 6- P hosphogluconate ^ff^^^^Z n{0 e-phospho-a-dehydro-S-deoxy-D-gluco- 
er-Doudoroff pathway, the dehydratafor ^g^^^h. J? about 600 amino acid residues. Two 
4 s nate. - Escherichia coli hypothetical prote ^^^StemB The first pattern is located in the N4ermml part 

located in the C-terminal half. 2) . p . [GA] . x(3) . [G A] [The C could be a 2Fe-2S ligand] 

M05] 305. IMP dehydrogenase /GMP reductase s|gnature ^ ^ QTp bjosynthesis , th e 

ss NAD-dependent reduction of IMP into XMP U>» a^SSSi it is a possible target for cancer chemotherapy, 
synthesis. As IMP dehydrogenase is assorted w. th "W*^^ ^ tw0 , M P dehydrogenase isozymes ,n 
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due]- 

[ ^ S^r^. E ™^H e ^™^^ K. , Bio, C h e m . ^5:5^5,9,0). 
[ ^drews^C.'.' Guest J.R. Biochem. J. 255:35-43(1988). 
r0 9071 306 (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 

0908 [1] York JD, Ponder JW, Chen ZW, Mathews ^^Tpot DA. Williams LT, Majerus PW; J Bio. Cham 
Biochemistry 1994;33:13164-13171. 2] Jefferson f^^^ pW; Proc Natl Acad Sci U S A 1995;92: 

PW; 

FEBS Lett 1991;294:16-18. 

[0909] 307 |Q calmodulin-binding motif 

J^RhJads AR. Friedberg F; FASEB J 1997;11:331-340. 

5 [0910] 308. .nosine-uridinep— 

nosine-uridine preferring nucleoside purine and pyrimidine nucleosides 

tifiedin P rotozoan[1lthatc^ 

into ribose and the associated b^ 

forthese parasitic organisms, wh.charedef.cie^ hom otetrameric enzyme of subunits 

, ,UNH from Crithidia fasciculata has been mechanism , it acts a proton donor to actrvate 

a. 34 Kd. An histidine has been s own to be important for me a * ^ uncharacterjzed protei f m var , ous 
the hypoxanthine leaving group. IUNH Is ^"^JitSl orotein yaaF. - Escherichia coli hypothet.cal protein ybeK 
biological sources, notably: - Eschench.a coll ^J^CSSal protein S P AC17G8.02. - Yeast hypothetical 

[0 911] Consensus pattern: D-x-D-[PT]-[GA]-x-D-D-[TAV] 1VI] A 

[0912] 309. (Insulinase) 

Insulinase family, zinc-binding region signature 

" £3S » « » N-,— .C »» residue k M. — 
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or lack ol activity ot those subunits is quite complex: 

°No» tse ptoteine belong to lama, M16 in the CassBcation o. peptidases [51. 

l0 918] 310. Involucrin repeat j |nvest Deima tol 1993;100:613-617. 

roQlQI Eckert RL Yaffe MB, Cnsh JF, Murthy S, Rorke bA, weiwi , 

W SS 11 inositol monophosphate «^ r ^^ Two 0 f these proteins are enzymes of the 
U has been shown [1] that several P^^^y Vertebrate and plants inositol monophosphatase (EC 
inositol phosphate second messenger «^J^^Si 3 1 3 57).The function of the other proteins » not 
a 1.3.25V- Vertebrate inositol polyphosphate ^-^JJ^^^pgdWiPS (3'-phosphoadenoside 5'-phosphosul- 

<s ^rTBacteria. protein cysQ. ^V^^SC^-S Mutations in suhB results in the enhanced syn- 
fate) or be useful in sulfite synthesis. - Escherichia col. prater .sun ^ & metab0 |, srn . 

Ss of heat shock sigmafactorlhtpR).-^ 

- Emericella nidulans protein qutG. Probably invoked hypothetical protein YHR046c. - Caenorhab- 

in salt tolerance as well as methionine b.osynthe*s - ^ast hyp™ 3 y P otejn upstream of 

so dii eiegans hypothetical protein F13G3.5^^ 

the pss geneforexopolysacchar.de synthes^-MG^ 



136 
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SSL a. Atack J.R P« N« Acad. Sci. U.S.A. tlOBMOOW-). 

[0924] 313. Ion transport protein r a , rinm i 0 n channel This family is 6 transmembrane helices in which 

0925 Thi S tamilycontainsSodiumPotas,um^ 
l he last two helices flank a loop which de ermines^ 
tarepeaUrffourtlm^ 

decarboxylation of isocitrate into alpha-ket <f^J™£Z£ti two are located in the mitochondrial matnx (one 
(ECJUU .42). in eukaryotes there are at least ^eeisozymes of idh. ^ cytop , asmia In Es . 

,5 NADTd^pTndent, the other NADP^ependent) by the phospho^lation of a serine 

cherichia coli the activity of a NADP ^ e P en ^ dehydrogenase (EC H1J5) 

residue; the phosphorylated form of IDH . comp lets* P ^ , 

(IMDH) [3,4] catalyzes the third step r .the ^^^^L^ (EC 1.1.1.93) [5] catalyzes the reduction of 
of 3-isopropylmalate into ^o^-methylvaleratejartra^ v T^^Tconse^ed region of these en- 

,o tartrate to oxaloglycolate. These enzyme^^^^^^ 

[LIVMPA]-G-[LIVMF]- 

Sci. U.S.A. 86:8635-8639(1989). 99 .22205(1 991). 

Ld m me animal p,os,a«c ^^STS^f^^ Su,* A, Vjayan M; Na, M B=i I*** 
35 [0930] [1] Sankaranarayanan R, Sekar K, aanerjee n, 
596-603. 

S SC^blndBNAa^^ 
opsoclonus ataxia. 

s«srj»3S5^ nm. * — . t* * « 

[0933] 317. Kelch motif (S wiss Q04652). In this protein there are six copies of the 

45 [0934] The kelch motif was inrt,a, ^' s ^ e ^'"^ G S2Sdase [1] for which a structure has been solved 

thiol proteinases and aspartic proteinases as wel la some .pro ^ KT|2 frQm SQybean . 
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which inhibit both cathepsin D (aspartic proteinase) and trypsin. - Alpha-amylase/subtilisin inhibitors from barley and 
!hli AtounT-l (WBA-1 ) from goa bean seeds [3]. - Miraculin from Richadella dulcifica [4], a sweet taste protein. 

major tuberous root protein. - Thiol proteinase inhibitor PCP. 8.3 (P340) from 
Wound responsive protein gwin3 from poplar tree [7]. - 21 Kd seed prote.n from cocoa [8]. All these 
p^hs contain from 170 to 200 amino acid residues and one or twointrachain disu.fide bonds. The best conserved 
region is found in their N-terminal section and is used as a signature pattern 
[0938] Consensus pattern: [LIVM]-x-D-x-[EDNTY]-[DG]-[RKHDENQ]-x-[LIVM]-x(5)-Y-x-[LIVM] - 

[ 1] Laskowski M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 

[ 2] Ritonja A., Krizaj I., Mesko P., Kopitar M., Lucovnik P., Strukelj B., Pungercar J., Buttle D.J., Barrett A.J., Turk 
V. FEBS Lett. 267:13-15(1990). 

[ 3] Kortt A.A., Strike P.M., de Jersey J. Eur. J. Biochem. 181 :403-408(1989). *u-flfi« 6659 

[ 4] Theerasilp S., Hitotsuya H„ Nakajo S., Nakaja K., Nakamura Y, Kunhara Y. J. Biol. Chem. 264.6655-6659 
(1989). 

f 51 Hattori T Yoshida N., Nakamura K. Plant MaL Biol. 13:563-572(1989). 

6 Krizaj I., Drobnic-Kosorok M., Brzin J., Jerala R., Turk V. FEBS Lett. 33315-20(1993) 

7 Bradshaw H D., Hollick J.B., Parsons T.J., Clarke H.R.G., Gordon M.P Plant Mol. Biol. 14.51-59(1989). 
[ 8] Tai H., McHenry L, Fritz P.J., Furtek D.B. Plant Mol. Biol. 16:913-915(1991). 

Stalto™^ 

fata Sn It^nd as a component of the following enzymatic systems: - Fatty acid synthetase (FAS), which 
SvzesSh f ormation of long-chain fatty acids from acetyl-CoA, malony.-CoA and NADPH. Bactena. and plan chlo- 
rootast FAS are composed of eight separate subunits which correspond to different enzymatic activities; bete-ketoacyl 
synmasels oTe of these polypeptides Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2; the beta- 
ZZ f yntnle domain isTocated in the C-terminal section of FAS2. Vertebrate FAS consists o a single muttrfunc- 
tionTchaTn- the beta-ketoacyl synthase domain is located in the N-terminal section [2]. - The multifunctional 6-meth- 
^ JS; (MSAS) from Penici.lium patulum [3]. This is a multifunctional enzyme involved in the biosyn- 
Sc polySde antibiotic and which has a KAS domain in its N-termina, section. - P ^^^^ 
enzvme systems Polyketides are secondary metabolites produced by microorganisms and plants from simple fatty 
adds KAS i one of the components involved in the biosynthesis of the Streptomyces po.yketide antibiotics granatin 
Ml tetracencTycin C [5] and erythromycin. - Emericella nidulans multtfunctional protein Wa. Wa .s involved .n the 
blvnthe sis Sidia green pfenent. Wa is protein of 21 6 Kd that contains a KAS domain. - Rhizobium nodulation 
ro en "e,1C^ acfs as a beta-ketoacyl synthase in the synthesis of the ^^^^S 
chain - Yeast mitochondrial protein CEM1. The condensation reaction is a two step process, the acyl componen of 
anTctivaTd Ly primer is transferred to a cysteine residue of the enzyme and is then condensed wrth an acfvated 
mafonTdonor with the concomitant release of carbon dioxide. The sequence around the active site cyste.ne is wel. 

due] 

[ 1] Kauppinen S„ Siggaard-Andersen M., von Wettstein-Knowles P. Carlsberg Res. ^^P? 1088 * 

2 Witkowski A., Rangan V.S., Randhawa Z.I., Amy CM., Smith S. Eur. J. Biochem ■ 19^79(1991 ). 

3 Beck J Ripka S Siegner A, Schiltz E., Schweizer E. Eur. J. Biochem. 192:487-498(1990). 
; 4 I b i'j To S MoLedi H., Col.ins J.F., Hutchinson C.R. EMBO ^^27-2736(1 989V 

[ 5] Sherman D.H., Malpartida F, Bibb M.J., Kieser H.M., Bibb M.J., Hopwood D.A EMBO J. 8.2717-2725(1989). 

r0941l 320. Kinesin motor domain signature and profile 

Kines n [1 2 3] is a microtubule-associated lorce-producing protein that mayplay a role ,n organelle transport. Kinesm 
o I an ol igomelic complex composedof two heavy chains and two light chains. The kinesm motor act.ity isdire, ted 
oward the microtubule's plus end.The heavy chain is composed of three structural domains: a large globular N-term,nal 
S " responsible for the motor actMty of kinesin (it isknown to hydrolyze ATP, to W^«™t» 
tubules), a central alpha-helical coiled coil domain that mediates the heavy chain dimenzation; and asmaU globu ar C- 
termina, domain which interacts with other proteins (such asthe kinesin light char.) ves^s -J memb^nous or- 
s nanelles A number of proteins have been recently found that contain a doma.n s.milarto that of the kinesm motor 
domain I eT] Drosophila claret segregationa. protein (nod), Ned is required for norma, chromosoma, segregation 
in me os s inTemales and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the m.- 
croTubuS minus enJ. - Drosophila kinesin-like protein (nod). Nod is required for the distributive chromosome segre- 
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rewp P r41 CENP-E is a protein that associates with 

and/ or spindle elongat.on. - Human mno whjch js essential for yeast nucie 

needed tor neurona. ^K^mB^C. - Chiamydomonas ^^^^^etegan. hypo- 
mitosis. - Arabidops lS tha .ana ™£>™° twisting ot the microtubules of the ^^^T teins , with 
teins seem to ^^^S^^n is located in the »T^£^£Z domain contains 
tnetical protein T09A5*The knew mo locatedjntne c-term.nal tan J* ^al halt otthe domainis 

[ 4] Endow S. A. Trends Biochem. Sc.. 1 5.^1 

109431 Consensus pattern: K-[LIVM](2)lGAbLj xl OVM1QQ -v 
(2)-A-x(3)-tLlVM]-x(3)-G K Suzukj K . Prote in Seq. Data Anal. 5:301-313(199 ). 

SSSSSSSSsasr^ssa-- - 

Consensus pattern. {PA1 piag9s G.W., Gray P.W.i Wright S.D., Mathison J.C., Tobias P S., Ulevttdi H J 

^XsC^ 

of six touch receptor neurons .n this nematoo . 
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.. n nrntpin i S |.i |sl-1 binds to one of the two cis- 

msmmm 

[0947] Consensus pattern. C-x(2)-ux^o, < 
25 bind zinc] 

1 „ Freyd G„ Kim S.K., Horvitz H.R. Nature 3M:87 ^ ^ 1465 . 1466(1992 , 
(1993). 

JS^f S' pf™ lipid tr»n.l« P«»«i" 'amily <LTP, [1 2.31 «t** <*< **» » ,a< *"* ' he '""*!,« 

sentation. 



xCxxxxCxxxxxxCCxxxxxxj 



xCxCxxxxxxxxxxxCxxxxxxCxx I HI + I 



- + l + 



'C: conserved cysteine 
•*•: position of the pattern. 



involved in a disulfide bond. 
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[0951] „ Pa., W « r ^^ 
jSXJ [The two C-s are involved in disulfide bonds, 

mi nun? K W A Annu. Rev. Biochem. 60:73-99(1991). 
r0 9521 326 (LAMP) Lysosome-associated membrane 



-xHingex-- 



proteins: 

. Xenorhabdus luminescens lipase 1 . 
. Vibrio mimicus arylesterase. 

. PU.* 2^-5T- «»™- I. ^ looked in a co„s.™ad sa- 

55 S5S. - iU-<- * • «""-; ra^tn^Can^Sups^ac,^ ^'-o« 
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u c u/„nr J Bioenerg Biomembr. 22:451-471(1990). 
1 11 ' q moriai R L Lau PC K. Protein Eng. 2:15-20(1988). 

* l3SSS5»^=£=a-"^ 

lanase cel-3 - Haemophilus influenzae proteins Pal andPc* Klebs^ p fe ^ surf ant ge ns 

IppL) - Pseudomonas solanacearum endoglucanase eg mjd prot{jins mxlJ and mxiM^ Strep 
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w^binds^^^^^ 
for 3lu in the last 

r09621 Consensus pattern. [LIVMI-Jx^^l _ JMaQ1 » 

. 10S67] PtN^^.^*^?ii^EI«)J19B«1««n«921. 

fflTwe WW" a * 9 s,e „,„ NAMspenctent M.«,sicn of W mwt ? 

■ ^^^^ 

a conserved histidine which .s assent, A to the c * * js ^ actjve site resld ue] - 

[0969] consensus pattern: [UVMA]-6-[EQ]HeiUNj i 

45 [ 2] Hendnks W., Mulders j.w.m., d,u y 
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iS=SSs£sss=sBs----- 

T09721 334. Legume lectins signatures , tjns n 2] . These lectins are generally 

are transposed and ligated (by formation of a new W™*™* e tterns speC ifi C to legume lechns have 

[ 1] Sharon N., Lis H. FASEB J. ,988) 
[ 2] Lis H., Sharon N. Annu.Rev. Biochem. 55.33 37(1986). 

[097 4] 335.CoA-.igases(n^ 

[51 for this protein family. Proteins known to belong i to th s wn y tQ bjnd p^yr,,,. - Alpha-1-ac.d 

COmp ° no „, rs namma chain, which seems to bind retmol [7] - Cruata^anin i j, h maturation. 

rrra^ 

45 - msectacyanin, a moth bilin-b.nd.ng protein, and la g y elatinaS e-associated lipocahn (NGAL) ( P 25) ^ MO 

(LALP) a milk protein from tammar wallaby ^01- Neutropn^g piasma ret , nol . binding prote.ns 

n^uced 24 P 3 protein) [11]. - Odoranfb.nd.nc , pro . (OBP^ch b,r ^ ^ _ ^ p 
(PRBP). - Human pregnancy-associated endometrial alpna g ^ ^ enzymatic a c tiv ,ty [12L 

iva which could transport small hydrophobic molecules ^ in » cQre or kerna) lip ocalins, are 
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ro981l Consensus pattern. lutiNuj * 

te&ssssa=sssssss=2i sssss- 

<PDOC00188> families [3.19] 

1 21 M.. N »Jf / c ™ ^ 1x P-0,ein 3d 3:753.76,0993). 

|,91F«arD.H.FE B SL.n.333-.99- 1 02„ 993^ 

. 

==-=-=ss= ' , 
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, 6 , «. D R North A C T, ««* W P— <* 
I^SSb-VeBSU* 333:99.1021^ 



arachidonate 5-lipoxygenase (EC 1 U1 ^- ' , 3 1 , 33) The iron atom in lipoxygenases is dour y 
. nCII rtlBloche-n^o<p M8 :Acort 1 p.*^i»e»^P.K..Bi..» 

enzymes. These enzymes are: - Fuma ase (EC£^M ^ ^ , afe thermolab. d.menc 

o, fumarate to L-malate. are thermostable and tetramenc : ar£ are ^J^^J.. 

[41 Williams S.E., Woolndge E.M.. Ransom b. 
so 9768-9776(1992). 
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-MCMS.ateofcnownasDNApo^ 

also known as CDC54, cdc21 (in S.pombe) or dpa (in ' ^J*'^ as CDC47 or Pro IHera (in A.thaliana).This 
plbe).-MCM6, also known a^ 

Lily isalsopresentinarchebactena In jmplies tha » these proteins maybe involved .n an ATP- 

and MJECL1 3.The presence * ^ * ^ 

consuming step in the ,n,t,at,on of DMA W^?™ * { found jn ATP . bi nding proteins. 

[0990] 341 . Macrophage migration inhibitory ^^^uZ^ea an important role in host inflammatory 
protein called macrophage migration mhtoitory facto ' G^JJKSi appears to serve as a pituitary "stress- 
responses. It play a pivotal role in the host response to i of 115 residues which is not proc- 

essed fromalarger precursor. D-do^ 

[0991] Consensus pattern. [DE]-P-o a x^j L>-' v,v i 

[0992] 342. MIP family signature b transmemb rane channel proteins, has 

Gently the sequence of a ^ M P 

been found to be highly related 1 1 to 4].The se proteins a eh ^ exchange of ions an d smal. molecule 
is the major component of lens fiber gap l unc, '°^ ap p ' 0 Teins form water-specific channels that provide the 
3S from one cel. to another. - Mammals "W^a3^K^«» with ^ h ^ mMy * n l 

plasma membranes of red cells and k-dney prox^ma I and I col e g ^ & ^ component of tne 

permitting water to move in the direction o , a " ^^^^ 

peribacteroidmembraneinduceddunngnodula^ and Wsi (water ^ treS s induced), 

proteins (TIP). There are various ™ 

40 Theseproteinsmayallowthediffusonof water ■«'^ ( ™. P S > movemen t of gfycerol across the cytoplasm* 
- Bacterial glycerol facilitator protein (gene ^^^fSSir (gene pduF). - Yeast FPS1 , a glycerol uptake/ 
membrane. - Sa.monella typhimunum propaned.ol d ff us,on facU ^ P medjate interc ellular corn- 

efflux facilitator protein. - Drosoph a ^^^^LLbs^ and thereby sending a signal tor an 
munition; it may functions by allowing ^nsport * « £™ _ Yeas t hypothetical protein YFL054c. - A hypo- 

45 exodermal cell to become an ep.dermoblast instead of a neurob asi y contajn sjx transmembrane 

Scalproteinf^ 

segments. Computer analysis shows .hat ^^^^S^Sp^ P^rn a well conserved region 
an ancestral protein that contained ^^^T^^Zmm the second and third transmembrane regions, 
was selected which is '^ ted ^ 
so [0993] Consensus pattern: [HNQA]-x-N-P-[STA] ILIVMI-] l=> Jl 

r „ Reizer ,, Reizer A., Saier M.H. Jr. CRC Cr, Re, Biochem. 28:235-257(1993). 

is r^ s ^ ~ g ■ sandai nn - sa,er mh - jr Mo '- 
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[09 94] 343. Mande.ate racemase / muconate ^^f^^^S^ 1) (MLE) are two bacterial en- 
LJelate racemase (EC : Jjjtf) (MR) «. — ^ yet they are related at 

zymes involved in aromatic acid catabolism. They wbj» ' 2] A number of oth er proteins also seem 

the level of their primary, quaternary (homooctamer) and tertiary struct^ [1 £ ch|oromuconate cyclois - 

KS*****""^ E-[DENQ]-P 10 and E bind a **. 



metal ion]- 



bers: 105 



oers. ma 

[1003] 347. o-methyllran.ferase (ma h V™ s-adenosyl mathionine. 

S3 

BM Hunter CN. Henningsen KW, J Bacteriol 1998;180:699-704. 

[1009] 349. Plasmid recombination enzyme , (M* Pre) independent of RecA. In such 

[1012] 350. Monooxygenase 

S3 E^^«^ h p«^~^•^" l- " ha " ,,-F3,, *' 

^nTSSl!?53S EE*. 1998;7:,250-,254. [2, Hershay JW. Asano K. NarandaT, Md. 
HP Ham* P, Merrick WC, Biochimie 1996;78:903-907. 

[1018] 352. Myc <^ KM ^ Ml '^"^^ ^'helix leucine zipper class of transcHption testers, see HLH 
^wSSt 1M Penn LZ. FASEB 0 ,999:12:633-65,. [2, Grander! C, Eisenroao RN. Trends Bioch.ro Sci 
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ri0231 354. MAGE family „ on M familv are expressed in a wide variety ol tumors but not in 

yatio J of malate into pyruvate importanttoraw.de range^ 

enzyme [1 ,2,3]: - NAD-dependent male enzyme i ^ idJ^ ^ * dependent malic enzyme (EC HL39). 
Tarbolylate oxaloacetate (OAA)^ It is found ,n N ^ ^ 

which uses preferentially NAD and is unable to d ^ r ^ y '^ m „, jc enzyme (EC 1JJ^0), which has a preference 
rndTsaheterodimerof highly relatedsu^ 
. tor NADP and has the ability to decarboxylate OAA. Th * ™™™ , ts a|so have tw0 isozymes: chloroplast,c and 
tnTaretwo^^ 

cytosolic. There are two other proteins wh ch are doeehf ^ ur y NADP ^ ep endent malic enzyme. - Yeast hypo- 
sfcA, whose function is not yet known but ™ J,AD o jn ^ epzyme enc ^ 

tnetical protein YKL029C, a probable ma^ 
5 two of them seem to be involved as a signature pattern for these enzymes. 

50 269:2827-2833(1994). 
M029] 356. (matrixin) 

Matrixins cysteine switch (akapeptidase_M10) 3 4 24 . } ajso kn0W n as matrixins [1] (see 

»0301 Mammalian extracellular matnx ""-J^STSt d Vcel.s in an inactive form (zymogen) that differs 
l <PDOC00129», are zinc-dependent ^^^ZIZ^- * conserved octapeptide is found two 
3S from the mature enzyme by the presence of region ha s bee n shown to be involved ,n automhi- 

re sidues downstream of the S " e ZinC ^ thUS ""^ 

oition of matrixins [2,3]; a ^^^S^ region'. 

. MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

45 . MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

• 

50 • SmIec^^ 

" mmp 5 EC 3 4 24, membrane-type matrix metal.iprote.nase 2). 
'. MMP-1 6 EC 3 4-.24, membrane-type matrix meta.liprote.nase 3). 
Sea u ch n hatching enzyme (EC 3.4.24.12) (envetys.n 4 
S5 - Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

o a r rrwi * P [DR1-ILIVSAPKQ] [C chelates the zinc ion] Sequences known to belong 
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rirvwi 357 Vertebrate metallothioneins signature (metalthio) . cadmium, nickel, etc., 

Eothi^ 

rhroughclusters of thiolate bonds. ™ T ' S ^ u ' th ™£^ 

and some prokaryotes. On the basis of structural rela ' onsh 'P s ^ S ;^ cs but wrtn cle arly related primary structure. 

Class II groups together MT's from vanous species such as sea u crms g , tjdes containi ng gamma- 

none or only very distant correspondence to class I MT s. Clas nil mi ™ 20 o1 tnese resld ues are 

cysteines that bind to 7 b.valent metal ions. £a«gna P N . te rminal section of class-l MT s. 

sLnofthemeta.^ 

[1034] Consensus pattern. C-x-C-l<3b I ^ 

, ,] Hamer D.H. Annu. Rev. Biochem. 
2 Kagi J-H.R-, Schaffer A. Biochemistry 27:8509-8515(1988). 
| 3 ] Binz P,A. Thesis, 1996. Unrversity of Zurich. 

H035] 358. Mitochondrial energy transfer prote in the inner mitochondrial membrane 
: Sttypesofsubs^ 

rrdt^speci. 

cytosol and imports malate or other *™^*™J™ attaXe and the oxoglutarate/isocitrate shuttles. - The phos 
o role in several metabolic processes such as ^t^W ^ mjtocnondrjal m atnx. - The b own 

ohate carrier protein, which transports phosphate ^g roups rom tne y tons f rom the cytosol into 

U ncoupling protein (U CP) which dissipates <^^.^^^^ pfd m which is involved in citrate^ 
he m tocSondr^l matrix. - The tricarboxylic .transport ^^^SL « provides a carbon source for fatty acid 

H^malate exchange. It is important for the b.oenergetics of hepatic ceus p q pfotein 

* anTsterd biosyntheses, and NAD for the glycofytic pathway. - The »™ mitochondrial proteins MRS3 

and MRS4. The exact function of these proteins is not known, ^ney PP moMtanq solute concentrations 

fntron of the COB gene and may act as carriers, exerting the. uppresa _or ac y y acri whjch seems 

n he miShondrion. - Yeast mitochondria, FAD ^^S^^^Pm. - Yeast protein RIM2. - Yeast 
40 essential for acetyl-CoA synthetase activUy. - Yeas p o Yeast P pf0teins YBR291c , YEL006W, 

protein YHM1/SHM1 . - Yeast P**^** elegans hypothetical protein K11 H3.3.Two other po^ 
YER053C, YFR045W, YHR002W, and YIL006w - Caenorha ° ° 9 mitoc hondrial inner membrane: - Maize 

teL have been found to belong to this ^^^^L, could play a role in amyloplast membrane 
am yloplast Brittle-1 protein. This protein ^ n ^^^7 [7] . PMP 47 is an integral membrane protein o the 
45 transport. - Candida boidinii peroxisomal membrane protein evolu , ion ary related. Structurally, 

peroxisome and I may play a role as a transport^ J J"^J* . hundred residue s. Each o, these domains 

[ r] Klingenberg M. Trends Biochem. ScL 
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[10 37] 359. Prokaryotic mo^bdopterin ^^ ^ZS^^nn cofactor have been shown 
A number of different prokaryotic oxidoreductases that ^ u ' re an ° are: . Escherichia coli resp.ratory nrtrate 

5 St share a number of regions of ^""^ 

educ ase (EC 1.7.99.4). This enzyme complex allows he 1 bacteria^ »j ^ a)pha chajn (gene na G) 

Stro^- ThTe^me is composed of three « 
tehenrclybdo^^^^^^^ 

whtehalsocontainsamolybdopterin-b.nd.nga^^^^ 

^ctase (DMSO reductase). DMSO reductase is the term ™ * nd c The A chain (gene dmsA) b.nds 
and Se compounds. DMSO reductase is c ° m P 0 ^ This enzyme reduces a sponta- 

, m orybdo P terin. - Escherichia coli biotir , suKox.de ^^^ a scave nger, allowing the cell to use bjrtn 

gene «dhA) of this dimeric enzyme bin **^ 

lis molvbdopterin - Salmonella typhimurium thosulfate reduct ase (gene p , moniae (ge ne 

l^Alca.igeneseutrophus.Escher^ 

2S cnlral part of ihesa an^msa , CHWi3W s^^ 
[1038] Consensus pattern, I& I aim] x i^nj a V , 

Acta 1057:157-185(1991). Microbjo , 2;785 -795(l988). 

S5 rss^ .— - 

(1994). 

11039] 360. Bacterial mutT domain signature resDonsible for removing an oxidative^ damaged form of 

The bacterial mutT protein is invoked in the GO system ^ A 3 lhe nucleotide pco |. 8-oxo-dGTP is inserted 
4 o gline(8-hydroxyguanineor7,8-dihyd^ 

opposite to dA and dC residues of tem P' at ^ N ^^^^^^^ with the concomitant release of pyrophosphate. MutT 
MutT soecif ically degrades 8-oxo-dGTP to the monophosphate w.tn Tn ^.^ ^ resjdues wh , cn 

45 These proteins are: 

. Bartonella bacilliformis invasion protein A (gene mvA). 
so - Escherichia coli dATP pyrophosphohydrolas* 
. Protein D250 from African swine fever viruses. 
Proteins D9 and D1 0 from a variety of poxviruses. 

- Yeast protein YSA1. 

. Escherichia coli hypothetical protein yfaO. 
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sssssssssssssss sssssss 

Escherichia coli hypothetical protein yrfE. 
Bacillus subtilis hypothetical protein yqkG. 
Bacillus subtilis hypothetical protein yzgD. 
Yeast hypothetical protein YGL067W. 

(1995). 

.10*21 361 Myb DNA-bindmg dom* repeal H^f » m<b ^eouclear DNA-bindina protein. M epe* 

****•*': position of the patterns. 



11075-11090(1988). . d Sci y.s.A. 88:4587-4591(1991). 

S3SE2T— BecKerD., Paz-A.ee Saedler H.. Salamim F-> R-*« M. . 
;^rSnK.. F ,nK S .B..A m d,K.T.Sc,enc, 2 4e,3,-9 35 ,19e9,. 
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[ 9] Klempnauer K.-H., Sippei «.e. cmo 
[1044] 362. NAM.penden, ^«^f^ e ^^"7^^^ «« """""^'T 

droxyacotoos phosphate to 9 lycsr*3- *^^*H M "» wd in NAD-binding «™ * |UVMFYW ,. 

G-x-N" A p Rossma nn M G Eur J. Biochem. 109:325-330(1980). 

110491 T r^TSS. JD CU„ Biol ,998:8:226-227. 

5 sr:=d^ 

B )[21 By random association (A6, A5B...AB5, f );™^ tW ° jsm jn which a histidine res.due ,s P^P^*; 
po n NDK are proteins ot 17 Kd that act via a p.ng_pong mechan, ^ hosphoe nzyme can transfer 

35 [10S2] Consensus pattern: N-x^)-n-i^^J i 
n^R.Aga-wata.MT^^ 

._■ /wir Sim Nitrite reductases (NiR) PI 

" ,,0531 366 Nitrite and sutme «du— . ^^•'^^STrSnU. Thoro are *o types ot 
SLt^educt-otnWotnto — 

r.;^d^^ 

^sSVlpos.ionoU^ 
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, „CampbellW.H.. Kinghorn J.R T-endsB»c^i_15 3 .^(1990,. 

II S^Oa^. E.L J Bacterial. ,73:,544-,553(,99,>. 
P05S, 367. ( NMT,Myr* W ,.^^^^ 

[1056] Consensus pattern: E-l-N-F-L-C-x-H-K- 

ST t~r OA, G*e, G.W., Gordon J.I. A* HP* 67:37 M 30„ 993,. 
,05?, '368. ADP-glucos. PV^^^^^J^'Ss,.^, [,.2KEC Z7727) catalyzes a 

[,0591 ADP-glucose "^tt^KSw" " ■"*> in bac,ana and P ' an,8: ^l"" 8 ' 6 
very important step in Ih. biosynmesis o, a ^'• a *J™ < 9 * « and ATRAD p. glu cose pyrophosphatase ,a a 
th. activated glucosyl**™^ 

tetrameric allosterically regulated enzyme. It « a ho "» t ««"™ ' are , numb8I ol conserved regions in 

,, is a heterotetramer o. Mo different ye. ^™%»£^JZ* Three of these regions »ere selected 
the sequence of eaclehal and plant ADP-g^ 

^^riraSrorr^^ 

|11Nate «PA. a ,eeneT.W.,Anr.^ J . M .,Sm«i..B,.,OKi,.T.W.,P,e.s J .P««. BW ,7: 1 0S,,093 
, S PrL ,. Bali K., Hutn.y d. Smithes B.J., Li. L. OKitsa T.W. Pure Appt. Onem. S3:535. 5 44„99 1 ). 

S3 S.'SE^SE^ l mainlining tne pH o, actryeiy making ceils. The motecefc, 
mechanisms of antiport are unclear. amino-terminus and a large cytoplasmic region at 

^X^^^^^^L, S; , Bio, Cm 19,272: 

n06 7 4] 370.W>dium:suHatesym P or^ wjth the concomrtant uptak e of 

ntegral membrane proteins that mediate the ,ntak of a - «dj « g number 

sodium ions (sodium symporters) canto 9^ ™ J^^J^g proteins: - Mammalian sodium/sulfate 
of distinct families. One of these [2] which transports succinate and citrate. - 

45 cotransporter [1]. - Mammalian renal ^'"^^'f c n Somonas reinhardtii putative sulfur deprivation re- 
Mammalian intestinal sodiurr^^ 

sponse regulator SAC1 [3], - Caencxhab*- ^^^J^^ ^ H,06 ° 8 " 

2 Pajor A.M. Am. J. Physiol. 270:642-6^(1996 . 
[ 3] Davies J.P., Yildiz F.H., Grossman A. EMBO J. 15.2150 21 59(1990). 
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110661 371. NitLMike domain „„w™in Thiaiatnaonr/cOTrrKjnregionbeMaanthe NitUprotein 

l , Ja"d in mioro^ »• » '» (*«• M ' **> ^o, a^mSu, 7 s*ama to bo mvolvod in a naclaophilio 
A^an,™c»s»iner^^^ 

attack oo tha nit*, carbon atom Cyanido hydrataaa (EC • 4£MS> Th „ ence 0 , cyanido hydrolase 

n a.oi« — ««- - itjsm^^^ ™ ,ir5 ' is *-* 10 me " 

, „ K*ayaahi M., «H. ^-T^rSS- 

' 110711 373. NusB family . -ma biosynthesis by transcriptional antitermination. 

s No^rotransmmer-gated for rapid signa . transmission at chemical 

Etransmitler-gatedion^ 

synapses. They are post-synapticol,gomenc ^^SZ^-*-* 'rom five types of neurotransm rt te,gated 
bindino. of a specific neurotransmitter. Presently, the sequence cation channel. In the motor endplates 

SSrs are known: - The nicotinic acetylcholine i recep to "J^d delta or epsilon) with a molar sto,chj- 
» of vertebrates, it is composed of four different subunrts ^J^*^ types 0 , su bunits: alpha and non-alpha 
omeCf 2:1 1:1. In neurones, the AchR ^^^SS^-^S^ receptC * an ***** 
52 called beta). Nicotinic AchRs are ^^'^ subunits (alpha and beta). - The gamm^ 

channel. The glycine receptor is a pentamer M& jon cnanne |. The quaternary structure of the 

aminobutyric-acid (GABA) receptor, wh.ch •J^J^SmLm to exist (alpha, beta, gamma, and delta) and 
35 GABA receptor is complex; at least four classes of the alpha class have already been sequenced). - 
There are many variants in each class (for ^ to : " ^J^^L as a neurotransmitter, a hormone and a 
The serotonin 5HT3 receptor. Serotonin is a b-gw^.'J ^ (5HT1 , 5 HT2, and 5HT4 to 5HT7) 

Mitogen. There are seven major groups of ""^^J^ is a TgandVed cation-specific ion channel wh.ch 
uansduce extracellular signal by activating G protems, * *5HT S£ r> an excitat ory cation channel. 

40 wrenactivatedcausesfast,depolarizingresponses "neurons ™»J dif f er ent types of glutamate receptors 

- ShSa»^ 

aid Gly racaptors are claart, avolulionar, ' e ^f ^^"L^erminal axtmcallula, domain ol AoNTOABW 
aimiiantiaa a,a ailha, abaant or »ory »*iob. in AchR. ha»a bean ahown to torn, a d*u»da 

bond]- 

55 , „ str oud R.KA, MCCadb, «P. Shaata, M B **amia,ry 2 9:,,00 9 ,.023„ 99 0 ) . 
[ 4] Barnard E.A. Trends Biochem. Sc. 17.368-374(1992). 



EP 1 033 405 A2 



11076] 375. Orotidine 5'-phosphate decarboxylase active £0 ^ ^ biosyn _ 

r id ^t^ oMpdecase is part ' wrt ? °r, 

thesis of pynmidmes, the aecarDoxyiaiion oi w whi | ptheDrokarvot j C and fungal OMPdecases are monofunctional 

[ 1] Jacquet M., Guilbaud R, Garreau H. Mol. Gen. Genet. 211:441-445(1988). 
[ 2] Kimsey H.H., Kaiser D. J. Biol. Chem. 267:819-824(1992). 

11078] 376. ATP synthase detta f^^^^u 2] is a component of the cytop.asmic membrane of eu- 

^ISSSTlUm: l UV M| -x-[UVMFVT,.x,3HUVMTHDENOK 1 .x«2HUV M ]-X^SA 1 -G. l UV M FVGA,-x- 

[LIVM]-[KRHENQ]-x-[GSEN] 

[ 1] Futai M.. Noumi X, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
f 21 Senior A.E. Physiol. Rev. 68:177-231 (1988). _ nftftX 
[ 3] Engelbrecht S. Junge W. Biochim. Biophys. Acta 1015:379-390(1990). 

11080] 377. Aspartate and ^^^^^ ea ^ of aspartate and carbamoyl phos- 
Aspartate '^f^^^^ i^^Sc**** of pyrimidine nucleotides [1]. In prokaryotes 

i also involved in the dagradation ol a-ginine [4] (M a,g,r,ine ctannw. P*"^ fc „ „, ^ 

: 5 phosphate. 

This region was seated as "jj^* T bind carbam0 y, phosphate] 

50 f 11 LernerC.G., Switzer R.L J. Biol. Chem. 261:11156-11165(1986). 1M /iQ9ffl 

Davidson J.N.. Chen K.C., Jamison R.S.. Musmanno LA, Kern B>oEssa ys 1 5, 57-1 64(1 993). 
3 Takiguchi M., Matsubasa T., Amaya Y, Mori M. BioEssays 10.163-166(1 _989 . 

ss TioniatoRB, LipscombW.N. Proc. Natl. Acad Sci. U.S.A. 81:4037.4040a 984). 
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annulus. Olefins may have a structural rola „ «**^ ™ »c lipase anchorage it lipolysis during 

variable length (from 30 to 60 residues * a cent a, ^^"^"^ hollic ^ „ proposed to be made up 
, „ M „,phy O.J., Keen J.N., aSullrvan J.N., Au D.M.Y, Edwards E.-W., Jackson P.J., Cummins ,., «*» T. 

ess s^snLr:=rr^ 

10« 111 Schoehn G, Moss SR. Nuttall PA, Hewat EA; Virology 1997;236.191-200. 

two different families on the basis of sequence similarrt.es [1 ,2,3]. The second tarn y 

^ , ,criiii7i (ODC) ODC catalyzes the transformation of ornithine into pu- 

- Eukaryotic ornithine decarboxylase (EC 4.1.1.17) (puo). uuo w y 

i trescine. (cr411 90 x fDAPDC) DAPDC catalyzes the conversion of di- 

- Prokaryotic diaminopimelic acid decarboxylase (EC 4. 11 .20) (iwuu,. u 

. P — .^r^c^r 

. SSrS p^bSyomefcarglhine decarboxytase (EC 4,1.19) (ADC). ADC catalyzes the translormation 
' ^attoagmauna the tlrs, step h the biosynthesis 0. putrescna „om arg,nie. 

^abo.eprouains.vmllemos.pm*** avowry -^«^-S^^2S^!2^ 
Twoofthe conwrved regions were ^^^^^T^S^Z^^^^^ 

( 5] Moore B.C., Boyle S.M. J. Bacteriol. 172:4631-4640(1990). 

£5 [1089] Consensus pattern: [KQ]-x-[TA]-x(2)-[GA]-S-S-E-E-K 

[ 11 Butler W.T. Connect. Tissue Res. 23:123-36(1989). 
[ 2] Gorski J.P. Calcif. Tissue Int. 50:391-396(1992). 



EP 1 033 405 A2 

[ 3] Denhardt D.T., Guo X. FASEB J. 7:1475-1482(1993). 



s [1 ] to be evolutionary related: 

,o - ESSEn. HES1 and KES1; highly reiated proteins of 434 residues that seem to play a ro,e in ergostero. 

- SstOSHI , a protein of 859 residues that also p.ays a ro.e in ergostero, synthesis. - Yeast hypothetical protein 
YHR001W (437 residues). 

. Yeast hypothetical protein YHR073w (996 residues). 
is - Yeast hypothetical protein YKR003W (448 residues). 

Ziwd pan «as sewed o, mis domain, a region .hat oonB,ns a conserved pemapeptrde. 

- 53 

[1094] 383. FMN oxidoreductase 
[1 095] 384. Oxidoreductase FAD/NAD-binding domain 
Number of members: 250 
25 [1] 

"^eZuashN 

ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM, Campbell W; 
30 J Biol Chem 1 991 ;266:23542-23547. 

oJSSChb--*. -ragmen, o, com n„ra,e roducase a, U A region: ,.«,p ,o o*a, 
flavoprotein reductases. 
Lu G Campbell WH, Schneider G, Lindqvist Y; 

similarity. These enzymes are: 
. . Xamnrnedenvdrogenase^.,^^ 

. -'arioxanminadehyd^ 
" g^noaX'^ 

^SSi) NADH-Ubiquinone/plastoquinone (complex I), various chains 



158 
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k=w=sks======s=» 

HI 

SSI oxidoreductas. (complex , d — ■ — ' * 

Q Rev Biophys 1 992,25:253-324. ,«,„„,„„„. oxidorsductasa chain 6. 179 members. 

S3s£SSSsss!-£=ss sassa 

"Z ■ 

. ssassssiS2»-« _- i 

The 20 Kd subunit is highly similar to [4]: 

" 3ubunil ^^^^^p'^^^^us^^nitriticaas NADH-ubiquinone oxidoreductase. 
* 5S?SSSSSl .ormat. hycG). 
" ire„!.olE 8 ch.,k ! biac<>lihydm9en.s.-4(9enehy1l). 

4Fe-4S ligand] 

It s also detectable in many proliteratrng «"» a ^'^Lcer p53 seems to as, as a lumo, suppressor ,n some 
*. I, is Ireauentr, mutated c, c^o regdlation. and ma, be a «an^a». *a. 

but probably not all, tumor types. p53 » probably roo» wo J |0( m p(0MM . 

J 4 j Lane op., Benchimol S. Genes Dev. 4.1-8(1990). 
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, „ ,,, Anderson 0 W . Mercer W E., App* E. X ~ « 
[,,07] 39V (P50R) Del* 1 ^"^SuEcTsIa that oataly** the terminal step in 

[LMF]-[DENQK] 

mom 392 Poly-adenylate binding protein, unique domain. 

" £ ^> P ^rr 9 rrplLrr^^p C and .engi pnenylpropano* m«**m 
Phenylalanine ammonia-tyas. (EC 4.3.1 .5) (PAL) » _ a **™£L njabolitea each as ttoanoids, (uranocoamann 
„«ch i. involved in the biosynthesis ot = »» e '^X. many important rolee in plants Paring norma, growh 

SSSSL*- (ec m .-j- - - - * — • ,he tem ™ a ' 01 a " 

rp,^rrrrrsrs~ 

' .„ c J RayPN Mahuran D.J, Mclnnes B.R. J. Biol. Cnem. 

„ [ 1] Taylor R.G.. Lambert NLA., Sexsmith E, Sadler S.J , Ray ™ , 

!S^m"2!!2-B- a., Retey J Biochemistry 33:6463-6467,, 934). 

35 

[1112] 394. PAS domain examples of PAS domains, 

i. CAUTION. This family does not currently match all «™£"«T P 
PAS motifs appear in archaea, eubacteria and eukarya Probably 
the most surprising identification of a PAS domain was that ,n 
40 EAG-likeK+-channels[1,3]. 
Number of members: 308 
[1] 

45 Zhulin IB, Taylor BL, Dixon Ft; 

Trends Biochem Sci 1997;22:331-333. 

, J^Z^*-**'****"-" saai.o*, a*. si,., and chromcphore. 

Borgstahl GE, Williams DR, Getzoff ED; 
so Biochemistry 1995;34:6278-6287. 

[3]Medline: 98044337 
PAS. a multifunctional domain family comes to light. 
Ponting CP, Aravind L, 

GamLanphos^ 

— ^^^^ 
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and m«un*a Tms protein betongs » a .ha, also inoludes: 



: p=S:rpS£^ 

„ — — — " 

sequence ol these protein^ WUVF ^ V HPChFM)*'MSNW 10 )- H 
[11141 Consensus pattern. [rYL] 

c Allie , PM Perin J P., Bonnet F., Bucquoy 8.. Jo.les P., Roentgen F. J. MO. Evol. 
[ 1] Seddiqi N„ Bollengier R, Alliel P.M., Perm J.r., 

Con,a^'arioa».d«PI N T m o,»,P— . 
lra-6.Nip-1 andTRIP-15)|1]. 
Number ot members. 49 

r?o 8 |r 8 rac<™n„nmemein,b,e.me,ip,o,, i ncomp te xes. 
Hor^Cro,Sfp«a SOT esub U n te a,e re9U »,s«,iP^.^«— ■ 

dnorm^LaspartylandL-asparagi^ 

[1120] Consensus pattern. [GAMLlVMi-j * n- 
55 *Zlpa,,em: P ^-p E HBH l -x ( 3HUV« .^.^V^MUVMPK, 
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M1211 399 (PDT) Prephenate dehydratase signatures f repnenat e into phenylpyruvate. In 

microorganisms PDT is involved in the terminal ^^^^i > { ^ at a | SO catalyzes the transformation of chorismate 

. P,— 0^P»a^ 

bacteria, .«* aa Bactaro*. f^^X^ ™ S "^^^l™™* » n 

„ pytuvata and lactate ara usad aa a carbon sou-ca^ „, , irsl srayme ot tha phoaphc^olpyra- 

. "noaphoanolpyruvata^otainph^^ 

^^^^^^^^^^ 
31 *:S^,Up»aau9a-peci«cpa»nea S a. 

Ha. _ 

45 [GAS]-x(2)-R 

[„»1 ^.(PEPCKATHPnoapnc^pyr^ 
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•a o« a nrf which is located in the central part of the enzyme. The 
[1130] Consensus pattern: L-l-G-D-D-E-H x w x iuej 

. . arth , e S i tes phosphoenolpyruvate carboxylase (EC 
M1311 403 (Pepcase) Phosphoenolpyruvate carboxylase active hi ^ k biC arbonate to yield 

signature patterns for this type lOf en ^. p . T . [EQ] . x(2) . R . [K RHl [H is an actrve site residue]- 
[11321 Consensus pattern. [VT]-x-T A m r • i J v i i residue - 

probably at the level of translation. 
- Aspergillus nidulans mitochondrial protein nempA. 

the N-terminal section was selected P . UU Vl-E-[LIV]-x-[ST]-x-P 

M1371 405. (PFK) Phosphofructokinase signature h ^ glycolytic pathway. It catalyzes the 

1 P osphofructokinaie (EC 2.7.1.11) (PFK) ^ ^S^^ bacteria PFK is a tetramer of identic* 
phosphorylation by ATP of fructose P ^ °l "^^ t g ggd^ 80 Kd subunit consist of two homologous 
40 P 36 Kd subunits. In mamma.s it is a tetramer there are three, tissue-specific, types d PFK so- 

which are highly related to the bacterial **g*^ n ^ PFK is an octamer composed of four 100 Kd dpha 

wasselected. X (4)-g-H-x-Q-[QFI]-G'-G- x (5)-D-R [The R/K, the H and the Q/Ft are involved 

[1138] Consensus pattern. [HK]-x(4j ia n x »j i« j 

tose-6-P binding] 

- • rsr^^ 

55 rn391 406 (PGAM) Phosphoglycerate mutase family phosphohis«dine signature ^ structural)y 
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„ „ M , e„z» TOS can c„a„ze,«a — «— » 
- The degradation of 2,3 DP f PGAM . the M (muscle) and B (brain) forms. In 

ao,iooinvol». Sa phosphoh, 5 t*ein-. ^ 

(1988). .„ ... mn „ , a FEBS Lett. 229:383-387(1 988). 

oi White M R, Fothergill-Gilmore L.A. rto= » u 

[1142] Consensus pattern. [DENbjxiuv j 

ttfim . [GSlx .„ IV NA]-[LIVMFY^-x(4HFYl-lDN^-x-G-V-E-x(2)-K 
45 . Consensus pattern. [QS] x ILIVM] i 

■k hh PalmieriRH .NoltmannEAPh.los. Trans. R. Soc. Lona, 
1 1] Achari A, Marshall S.E., Mu.rhewad H., Palm.er, H.H 
Sci. 293.145-157(1981). 34 544-545(1992). 

protein; each domain is composed o t s x repe 
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1 -phosphate into D-mannose 6-phosphate. PMM» equ.rea ^ fof ^ enzyme rfbK 

example, in enterobacteria such as Eschencheco 

St^olvedinthesynthesisolthaOan^ (gene algC) is involved >n the 

o, the M antigen capsular ^ 8a "^ 

biosynthesis of the alginate layer 3] and hi ^omonas P biosyntne sis of the nod factor, 

of xanthan [4], In Rhizobium stram "f*f™™*^ etXs N . a cetyl-D-g.ucosamine 1 -phosphate .nto the 
. Phosphoacetylglucosamine mutase (EC 5.4.2.3) wh>ch 
6-phosphate isomer. 

2 0 . Ureas. operon protein ureC from Heliooaaoter pylori. 

. Eschericho coll prow, mrsA. „ h ^ nhoolvc00ro ,eir\ involved In exocylosis. 

due] 

. Note: PMM from fungi do not belong to this family. 
30 n]Oai,B.,U 0 ,,RayW,. J ,KonnoM.^ 

' 2] Stevenson G„ Lee S.J., Romana L.K., Reeves p.k m 754 . 9763(1 991). 

[„47] 410. PH <^^° „ . dOTai „ ol ebon. 100 residuee that occurs in a «ide range o, proteins 

subunit of heterotrimeric G proteins, 

. binding to lipids, e.g. phosphatidylinositol-4,5-bisphos P hate, 
bindinq to phosphorylatedSer/Thr residues, 
4S . Smenfto membranes by an unknown mechamsm. 

residues within the PJ- Idbrnw^ be)ong tQ the following families: 

Proteins reported to contain one muio 
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delta, soform gamma contains iwo rn wi» vuiR073w 

Residues. -Oxysterol binding proteins ^^^^^^^ of rhoand rac, 

. Mouse protein citron, a P^^^^.^^L-.tan like BEM2, BEM3, BUD4 and the 

. several yeast P rote.ns Caenorhabditis elegans protein MIG-10. 

BEM1 -binding proteins BOI2 (BEB1) and bui \[okjo , 

. caenorhabditis elegans hypothetical proteins C04D8.1, K06H7.4 and ^ 

. YeaThypothetica. proteins YBR129C and YHR155w. 

some of these cases, the profile will align only to one half of the PH domain. 



cutoff threshold. 



nl nonR Clark K L Baltimore D. Cell 73:629-630(1 993). 



(1994). 



411. PHD-finger 
PI 

Aasland R, Gibson TJ, Stewart AF; 

Trends Biochem Sci 1995;20:56-59. 
Number of members: 181 DhoS pholipase C profiles Phosphatidylinositol-specrfic phos- 

ylation and binding of regulatory proteins [2 to 4J. their domain structure, their regulation, and 

between these two regions is only 50 " 100 res '^ S ^ ^^ms The two conserved regions have been shown 
possibly invohod in Ca-dopsndent mtm ^ h n "^*™ n ^ te , ily ,„ lne x-oox domain oecu- also in prokar/otic and 

Second Messenger Phosphoprote.n Res. 26.35-61(1992). 
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[ 4] Sternweis P.C., SmrcKa a. v. i lenua « 
[1149] 413. (PI-PLC-Y) "-^"J^ enzyme, plays an important 

role in signal transduction processes [1]. ' tcat f y zeS * , a XnVsitol-1 4 5-triphosphate. This catalytic process >s tightly 

Mmn iM~M»<gi« te "^-™/^~C i^SSr T„. twocons.rvad -agions hava ba.n sto«n 

rbaCsr;ir^~^ 

possibly involved in Ca-dependent ™ m ^ n ""^* ity to , he wax domain occur also in protayouc and 

Wcond Massangar Phosphoprotcin Ras^ 35* I W 

; [ 4] Sternweis PC, SmrcKa a. v. mbmud ^ 

[1150] 414. (PK) Pyruvate kinase active site signature converS ion ot phosphoenolpyruvate 

Pyruvate kinase (EC 2 7.1 .40) (PK) Mf^^^g^ both magnesium and potassium ions for 
^pyruvate with the concomitant phosphory at o ^^^our, tissues specific, isozymes: L (liver), R (red 

o its actrvity. PK is found in all living ^^ n ,^^£™ "7n Escherichia coli there are two isozymes: PK-i 

acid residues. . ted tnat inc | udes a lysine residue which seems to be the 

40 [1153] 415. (PLDc) Phospholipase D. Mm .site mom AD p. r ibosylation factors (ARFs). 

45 P roteins - -j main ri UD |ication which is apparent by the presence of two motifs containing 

so repeat units has not been achieved. 
Number of members: 139 
[1] 

S s fdeniCo^^ 
Porting CP, Kerr ID; 

Protein Sci 1996;5:914-922. 
[2]Medline: 96334293 



EP 1 033 405 A2 



a mi i„ of nhosDhohydrolases and phospholipid synthases that includes pox- 
A duplicated catalytic motif in a new superfam.ly of phospnohyar 



virus envelope proteins. 
KOO ?ren E ds Biochem Sci 1996;21:242-243. 



Wang X, Xu L, Zheng L; 
J Biol Chem 1994;269:20312-20317. 

Sin 9 ar WD, Brown HA, Srernwuis PC; 

PM 7s are proteins of about 42 to 50 Kd wh,ch bind a «w ^3aSd The first one is located in the N-terrmnal 

of the zinc ion. X -N-H-K-P-E [E is a zinc ligand] 

ri155] Consensus pattern. Y-x-D-x-N H * r ■= i 

ri156l 417 (PNP UDP 1) Purine and other phosphoruses family 1 signature 
,s Sowing^ 

. laftft , EC 2 4 2 11 (PNP) from most bacteria (gene deoD). This enzyme catalyzes the 
. Purine nucleoside phosphorylase (EC 2.4.2 MPWTT mm nate m0 , eC ules [1 . 

' " 

Ma Si9 oa, U roP». S ro,ao— ^ 
45 [1157] Consensus pattern. [GSTJ-x-faiLivivij i 

to a different family of phosphorylases (see <PDOC00954». 
phases (EC 3.1.3.16). PP2C [1] Is a ™~™"° J^" 6 ™, » actMly. It. exact physiological tola . - 
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and Paramecium tetraurelia. nhnsohatase (KAPP) [3] is an enzyme that dephosphorylates 

(PDPOW. which cajVzesdephc.phory.a^n^ 

of the pyruvate dehydrogenase complex. P^^^S^^ locat ed in the N-terminal part and contains a 

llisS] Consensus pattern: [LIVMFY]-[LIVMFYA]-[GSAC]-[LIVM]-[FYC]-D-G-H-[GA\/] 

Note- pp2C belongs [6, to a superfamily which a,so includes bacterial proteins such as Bacillus sooI.E. rsbU and 

■ S 680? IcfG as" we,, as a domain in tunga, adeny,ate cyclases. 

r „ WenK,. Trompeter H.-,., Pettrich K.-G., Cohen P.T.W., Campbel, D.G., Mies.es G. FEBS Lett. 297, 35-138 

^aedaT., Tsai A.Y.M., Saito H. MoL Cel.. ^^ST^SLi 266793-795(1994). 
3 Stone J.M.. Co.linge M.A., Smith R.D., Horn MA VUm J £ *^^ W|y 32 ;8987-8993(1993). 

, [fi] B^rkP Brown M.P., Hegyi H.. SchuKz J. Protein Sc.. 5,421-1425(1996). 

[116 0] 419. (PPTA) Protein preny.transf erases -'P^ ^ ^^i^ four residues from the C-terminus 
Protein prenyttransferases catalyze ,the T^^^^^^^^m^T^^^^^^ 

Known protein prenyltransferase alpha subunits are:. 

,s . Mammalian protein famesyltransferase alpha subunjL 

- Yeast protein RAM2, a protein famesyltransferase alpha ■"bunrt- 

- Yeast protein BET4, a protein geranylgeranyltransferase alpha subun,t. 

The conserved domainofthe alpha ^^^^^^Z^^^^ 
^Consensus P a^ 

111621 r 11 Boquski M.S., Murray A.W., Powers S. New Biol. 4.408-411(199^). 
4 s S5 

Protein phosphatase 2A (PP2A) is a ^^^^^ transduction PP2A is a trimeric enzyme 
eluding the regulation of ^« J^^fi ^-SSl a'es Kd regulatory subunit (PR65), also called 
that consists of a core composed of a calalj « 8 "T; ^? subunjt (subunit B) , which confers distinct properties to 
subunit A; this complex then associates with a third vanahto «* .unrt (sudu > h conserved in 

so the holoenzyme [1]. One of the forms of the variable s ^^^^coci,. This subunit could perform 

Ce nsus pattern: N-[AG]-H-[TA]-Y-H-I-N-S-I-S-[LIV^-N-S-D 

[1165] [ 1] Mayer-Jaekel R, Hemmings B.A. Trends Cell Biol. 4.287-291(1994). 
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«1661 421 N-(5'phosphoribosyl)anthranilate(PRA)isomerase 
SI wLnns M, Priestle JP. Niermann T, Janson.us JN. 

pattern one ol these ^.r^^D-xtaHM-x-ISTtK-E 
[1168] Consensus pattern. K-ILIVM] x n w 

» pyrimidines, histkjlne and tiyptopnan. nr y 

stability and activity- are , 0un4 ,„ yeast thera a,e at laast tor jw^ ^ 



Number of members: 31 
PI 



Medline: 983621 48 packaging proteins 

packaging. Yu D, Weller SK; 

transmembrane region. 

peptide transporter PTR2. 
; , n^onendent oligopeptide transporter PeptTV 

-. rSn^X^oti^-PonatPap.T, 
- Drosophila optl. 
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„ „ PTR2 A and PTR2-B (also known as the histidine transporting protein 
. Arabidopsis thaliana peptide transporters PTR2 A and P 

. caenorhabditis eiegans hypothet,cal proton K04E7.2. 

. Escherichia coli hypothetical protein ybgH. 

. Escherichia coli hypothetical prote.n ydgR. 

. Escherichia coli hypothetical protein yhiP. 

. Escherichia coli hypothetical protein yjdL 

. Bacillus subtilis hypothetical prote.nyclF. 

-j „, oi „= ^ka PUM-HD, Pumilio homology domain) 
JSS^SS" as a .andam .epaa, o, B soma sacaano. appaa, to ha». 5 

. S*^"— PLD Bd0,ms ' a PI3K iso,om * 

Number of members: 71 
HI 

50 Ptdlns 3-kinases: binding partners of SH3 doma.ns 
Ponting CP; 
Protein Sci 1996,5.2353-2357. 
[1181] 430. ParA family ATPase 
[1] 

Motallebi-Veshareh M, Rouch DA, Thomas CM, 
Mol Microbiol 1990;4:1455-1463. 
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nlra 432 Peclinesterase signatures the hydrolysis ol pectin into peotale and methanol In 

play a role in the catalytic machanisirt pi- The secona p 
Lated in the central pari ot these enzymes. 

nlR a yJ Khapp,.3^0.. B rd..So r w. E a,,B^.- 1 , 9 .t^ 9 ea ) . 

20 rn841 433 Pentapeptide repeats (8 copies) 

CI repeats are topnd *™^^£Z « »es. repeats is 

. S^bss- — — — 
~~ ■ — — 

Anabaenasp. strain PCC 7120. 
30 Black K, Buikema WJ, Haselkorn R 
j Bacteriol 1995;177:6440-6448. 



Bateman A, Murzin A, Teichmann SA; 

'tirSan.A.PohinsonC.SchrcderWP; 

FEBS Lett 1998;428:241-244. 
[1185] 434. Polypeptide deformylase 

^^^^^^-^^^^ 

MeinnelT.BIanquetS, DardelF, 
jMol Biol 1996;262:375-386. 
|21Medline: 98332750 

3 JMol Biol 1998;280:501-513. 



172 



EP 1 033 405 A2 



central part. 



[ 3] Ouzounis C, Bork K, uasdn <j., ^ 
M1871 436 (Peptidase M17) Cytosol aminopeptidase stature catal es the removal of unsub- 

rnun. 178:1459-1464(1991) Lipscomb W.N. J. Mol. Biol. 224:113-140(1992). 

r 2] BurleySX.. David RR jf^^^t 176=1»-16B(10W). 

[1188] 437. Assemblin (Peptidase family S21) 
[1] 

Stallings WC; 
Nature 1996;383:279-282. 

. OK Me P*" «#* * ,9en 1 ( 2« oollen-speOic proMin ZmC13. These proteins are .meal ptojabjr 

S " .OXXXCXCXXXOXXXXCXXXXXXXXXX^-XCXOX^XXXXOX^OXXXXXXXC^X^ 

.. C : conserved cysteine involved in a disulfide bond. 

position of the pattern. 

ttern - [EQ] -G-x-V-Y-C-D-T-C-R [The two C"s are probably involved in disulfide bonds] 
- Consensus pattern. [EU]-i3 x v 
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u . m Mnn<salvaRI Gonzalez De La Pena M.A., Lahoz 

Number of members: 49 

£^17^0,3^^,— ^^^^^ 

tol p I and II sequences. 
Ansarl AA, Shsnbagamurthi P, Marsh DG; 

eukaryotic sources. 

' . Consensuses e-R-x.iLI^FAWSHUVMF.-X^GSA^Vn-P^VMF, 

ruMl 441 Presenilin t . disease 121 It has been found that presenilm-1 

Number of members: 23 
PI 

30 Medline: 98045995 

Presenilis and Alzheimer's disease. 
Kim TW, Tanzi RE; 

CurrOpinNeurobiol 1997;7:683-688. 
[2]Medline: 98045995 
35 Presenilis and Alzheimer's disease. 
Kim TW, Tanzi RE ; 

CurrOpin Neurobiol 1997;7:683-688. 

40 Zhang W, Han SW, McKeel DW, Goate A, Wu JY, 
J Neurosci 1998;18:914-922 



■&=H=~ 

45 C Staufenbiel M, Sommer B, van de Watering M, Clevers H, 
Saftig P, De Strooper B, He X, Yankner BA; 

Lsphorfcosynranslerases (PBT) are «^»^^ Member ol PRT. are involved in ,he biosynthesis 
-. SSSpS^S ,S^^, 2 a».XOP m) .»nicH,sinvo.edi„pennesa k ,a 9S . 
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Lnri/nr be oart of the PRPP binding site [1], 

. SnUition 11 of the pattern most of these enzymes have Gly. 
[1196] [1] Hershey H.V., Taylor M.W. Gene 43:287-293(1986). 
M1971 443 (ProCA) 

Prokaryotic-type carbonic anhydrases signatures ^ jcn catalyze the reversible hydration of carbon 

CaSanhydrases (EC 4.2.1.1) 

dioxide.lnEscherichiacoli.CA(genecynT)is.nvolv e ^ 

decomposition of cyanate by cyanase gene cynS^By this ac jon. ^ [2] prokaryotlc d p la nt 

In photosynthetic bacteria and plant ch oroplast * ° ^ distinct from the one which groups the many 

chL P .astCAarestructu^ 

different forms of eukaryot.c CAs (see <PDOC00 4&>V Hyp ^ deve|oped for this fam.ry of en 

is 

ri198l 444. (ProlyLoligopep) 

. P^end=^dase (E C 3 .«,*^ 

terminal side of lysyl and £*; ** „ a „ snzyme „, , erno ves N-terminal dipeptides sequen- 

. Dipeptidyl peptidase IV (EC 3AH.5) (DPP W)^^^ . , 

45 with a f ree amino-terminus. 

» Ltains afoul 700 to 800 amino add.). , LlvMFY wl-x(1 4)-G-x.S*G.G.[UVMFYW) ( 2) [S is the active sit. res*.) 
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, „ Ha* 9 s ,0.. W U Ba-.», —a <■ 
[2] Barrett A.J.,RawlingsN.D. 



[ 

4] 

enzymes. 

„,„,„.«, Wlnklar ME. J. Bade-W. ""i 60 ^. 60 * 5 ^, 177:1 ei7-1S29(l995). 

. Co— pananv. DHSGD^-D-lPEHUWn.D.IUVMGAO] 
[11U M,R.»— LF.JT.H— ^ 
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t „ ,«iHn«s have been shown [2,3] to be essential for the catalytic activity of 
protuding in the P-^.^JjJJ e. ed to the yeast mitochondrial inner membrane protease 

SPase I: a serine and an lysine. SPasei I 13 e™™* 1 ? <™ J jdes jred for the targe t,ng of 

subunit 1 and 2 (genes IMP1 and IMP2) ^^^^Tl£ the inter-membrane space [4].ln eukaryotes 
proteins from the mitochondria, matnx across the ^^^^ composed of at least five subunits: the 
the removal of signal peptides , I by ^^^ t ^3^c retteulum membrane. Two components of 
signal peptidase ^^J^^^^^^ )a J^ as well as the yeast SEC11 subunit have been 
mammalian SPC, the 18 Kd (SPC18) and the 2 K ' SPC ^s IM P1/IMP2. Three signature 

shown [5] to share regions °< 

patterns have been developed for these proteins^ The jn the spc subunits , and the third signature 

[1215] 1 1] Dalbey R E., von Heijne G. Trends B.ochem. ^_™*JW™n * Nunnari J., Fox T.D., Walter P. 
Chem. 267:13154-13159(1992). l 3] Black M.TJ. ^^^^^^ G., Bron S. EMBO J. 11: 
Science 262:1997-2004(1 993).[ 5] van D.,l AM, deJo ^ ^ h ^ np> 6 g [E1] 

2819-2828(1992).[6] Rawlings N.D., Barrett ^J-^^Sa^e.Si Eukaryotic thiol proteases (EC 3.4.22.-) 
[1216] 44ft (P*-*^ 

[1] are a famiV of proteose enzymes ^'^^^^JL^ completes the essential catalytic tnad. 
Intermediate and is facilitated by a nearby h.st.d ne s aspa g " are only provide d for 

The proteases which are currently known to belong to this ^Tec 34 22 1) H (EC 3.4.22.16 ), L (EC 
recent determined sequences). - Vertebrate lysosoma catheps, ~ ^f^ 1 , J )(al ^^ a , C atl»- 
, kai),andS ( EC3^^^ 

psin C)[2]. - Vertebratecalpams K %^2l^™Llng domain . - Mammalian cathepsin K, which seems 
both a N-terminal catalytic domain and a C-termmal ca teiurr 1 dto ng nvdr0 |ase. An enzyme that catalyzes 

,he inactivationoftheantitumor drug I ( *f fEC aiSl*)" papaya latex papain (EC 3A222), chymo- 
0 kidney bean EP-C1, rice bean SH-EP, ^^^^^^ pea turgor-responsive protein 15A; 
papain (EC 3A226), cancan ^^f^^Z/Jn alpf^elaT and gamma; tomato low-temperature 
pinea P plestembromela,n(EC3^^ 

induced, Arabidopsis thaliana A494, RD1 9A and RU<n a. noub Schistosoma 
^proteinases from the worms Caeno— 
,5 mansoni (antigen SM31) and Japonica (antigerr ^ IJ^^ cp2 . / CrUZ i P ain , ro m Trypanosoma cruzi and 
ostertagi (CP-1 and CP-3). - ^^^^Z^F^^ -P*** - Proteases from Leishmania 
brucei. - Throphozorte cys te, ne prcjeuias (TCP) <£ m ™ cathepsjrvlike enzyme (v-cath). - Drosoph.la small 
mexicana,TheileriaannulataandThe,ler,aparva ^lovir v n _ ^ ^ protease 

optic lobes protein (gene sol), a neuronal P r *?'"^^ 
40 BLH1/YCP1/LAP3. - Caenorhabditis e.egans ^^^^^^ ( ge ne pepC) [5]. - Thiol protease tpr 
dases are also part of ^^^^^T^^^ to this family, but may have lost their 
from Porphyromonas gingtvahs Three other gote. s ar £ u tu y rf by g g(ycjne . Rat 

proteolytic activity. - Soybean oi. body P^^a^cSeTsin L but with the active site cysteine is replaced by a 
testin, a Sertoli cell secretory proton highly ^SfSwi h a UNMomaki protein (see <PDOC00382». - 

residue]- „ m WD oAn m r^Tl W-x<3W FYWl-G-x(2)-G-[LFYW]-[LI VMFYG]-X-[U VMF] [N 

SfSTTlSSi MM. TttlsaHSWflWW Kirschke H . Ba.re. A.l. Ra*g S N.D. Protein Pro,. 2: 
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A J Meth Enzymol. 244:461 -486(1 994). 

M2201 450 (peptidase M24) Aminopeptidase P and proline dipept.dase signature (1) 

a onHHaafl P (EC 3 4 119) is the enzyme responsible for the release of any N-termmal amino acd adjacent to a 
^^^1^^^^^^%^^) splits dipeptides with a prolyl residue in the carboxyl 

; 183-228(1995). . 

3 GuenaerichFP J.Biol. Chem. 266:10019-10022(1991). . . F| 

! Season D R Kamataki T., Waxman D.J., Guengerich F.P., Estrabrook ™->*»™™ G ™** Z FJ - 

Coon M J Gunsalus I.C., Gotoh O., Okuda K., Nebert D.W. DNA Cell Biol. 12:1-51(1993). 

[ 5] Degtyarenko K.N., Archakov A.I. FEBS Lett. 332:1-8(1993). 

55 pa^^np'oder'^D, Keen NT, JumakF, Science 1993;260:1503-1507. 

Sep,^^^^ 
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r, ,- h- ««iri»«.rEC 3 4 13 9) (prolidase) splits dipeptides with a prolyl residue in the carboxyl 
proline residue. Proline d,pept,dase(EC 3^1^) ro P djpe ptidase (gene pepQ)[2], and human proline 

5ST ™ ■ P— « ^HOSYWVMTHS^H-xW WWW 

[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248.183-228(1995). 

[1232] Methionine ,or the removal of the amino-terminal (initiator) me- 

Methionine ammopeptidase (EC 3^1^ ™^^ or0 karyotic proteins it the penultimate ammo acid is small 
thionine from nascent eukaryot lC cytosohc and c W^ W™WK P ^ subfamj|jes 

and uncharged. All MAP studied to ° nl * share 3 limited amount * ~ 

of MAP enzymes are known to exist [1 .2]. u Whl,e f^j*™ ^ Escherichia coli MAP [3],to be involved in cobalt- 
qU ence similarity mostly clustered around t e res, d ^^^ 

binding. The first family ^ -^S MmRZ second subfamily also includes proteins which do not 
is made up of archebactenal ^^rt^^.™^ pro ,i f eration«ssociated protein 1 and fission 

includes residues known to ^ involved >n colbalt-b.ni J'"| s h . x(4HUVM] . x . [H n H YWV] [H is a cobalt ligand]- 

N are cobalt ligands] 

j Sci. U.S.A. 92:7714-7718(1995). oi-9R<> 9RRMQ961 

[ 2) Keeling P.J., Doolittle W.F. Trends B.ochem ^^y- 
s Roderick S L Mathews B.W. Biochemistry 32:3907-391 2(1 993). 
[ 4] RawtingsN-D , Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

s [1234] 454. Peroxidases signatures on7umps that carrv out a variety of biosynthetic and degradative tunc- 

Peroxidases (EC 1.11.1.-) [1] are heme-b.nding en ^ ^^ e ™ * ™ d J distributed throughout bacteria, fungi, 
tions using hydrogen peroxide as the elecuon^ 

plants, and vertebrates. In perox.dases the heme P f^fZ^J^fa dWal histidine) serves as an acid- 
iron is a histidine (known as the proximal h.st.dine). ^ e reqions around these two active site 
,o base cataryst in the reaction between hydrogen J^^^^e"^ which one or both of these 
residues are more or less conserved in a ™^*J^**£^\ EC i „ jjj. - Myeloperoxidase (EC 1.11.1.7) 
regions can be found are listed below.-Yea^ 

(M 9 P0). MPO is found in granulocytes and ' Sh acts as an antimicrobial 

system of neutrophils. - ^^^^.^^^ (ound in the cytoplasmic granules of eosinophils. - 
« agent. - Eosinophil 

Thyroid peroxidase (EC 1.11.1.8) (TPO). i piays «* thvrQa | obu |jn to yield the thyroid hormones T3 and T4. - 
the iodination and coupling of the hormonogenic tyros, es in ' ^ y»W V ^ 
Fungal ligninases. Ligninase catalyzes the first step n the degradat 10 of harm it a P y ^ 
the?(a,pWc( b eta)c.eav = 
so a large numbers of isozymes of peroxidases, some oi inem p J f response toward wounding, 

of the aromatic residues of suberin on the cell Some 
othersare involved in them^^ 

bacterial species produce T^^^ZTZt) an6 perAfrom Bacillus stearothermophilus. 
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[ t] Dawson J.H. Science 240:433-439(1988). 

[1236] 455. plkB family of carbohydrate kinaseS h S ^ U t r B eS and ourine kinases are evolutionary related and can be 
has been shown [1 ,2,3] that the ^'^P^ (E0 27^A) (gene scrK). - 6-phc^ 

ADK1 - 2-dehydro-3-deoxygluconokinase (EC 2iLflS » ' g . , gsk) . Ta gatose-6-phosphate kinase 

fpLphateLse)^ 

<FC 2 7 1 1 44) (phosphotagatok.nase) (gene lacC) E f ne,cn ' d P *g rjcnja coli hypothetical protein y.hV. - Bacillus 
SeSpTo-ten^eil -Escherichia colihypothe^ 

SSpothetiil protein yxdC. '^^W^^^^^S^ Two of these regions were selected as 
o" ^ amino acid residues that is located in the N-termina. sect.on of 

signature patterns. The first P^^T^ ^a conserved region in the C-termina, section, 
these enzymes; while the secc<,d pa* Mi "2JjS^ AW p„Q8H^ 

Kn S r P :~ 

115381 456 PhospholipaseA? active sites signatures ]ds lrom ffie second carbon group « 

glycerol. PA2's are small and rigid proteins 01120 an« ™ , wocon581veclias ia u e S ,ahisMin e andanaspa« 
. oinds a calcium Ion which Is required lor ac«y_ The ^^^ MnMd ^kea, lizards, bees add manma a. 
acid panicipat. in a •catalytic network'. Many ^^^Z^MMM as well as two lass characterized tome, 
to toa latter there are a, leas, tour toad \^^^^TZn are presynaptic neurotoxins which inhM 

5frommulga. 

tophe J. Eur. J. Biochem. 186:23-33(1989). ^ 

[1240] 4 5 7.Pho S phory.asepyridoxa,- P ho^ 
so oster ic enzymes in carbohydrate metabolism. They J^J^^ d ^ r in the ir regulatory mechanisms and 
Sen as grycogen, starch or maltodextrin. Enz V meS h ^^ an d structural properties. They are pyri- 

"eirnJLubstrates. However^,. ^ 

doxal-phosphate dependent enzymes, the pyridoxai r gro v enzymes, 
^conserved and can be^^ 

b^rta^ 
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.egions in the calalydc doma* ol P-««" "nasei . Two ol has • ™« _ ch ^ „, rssld „ 

region, *** ™ located in » N«a, ^ „ ATP blndin9 . T„e second region, *ch « 

in this region and are completely missed by th« pattern • YCT](3) [D is an active site residue]. Most ser- 

Consensus pattern: I«J^ C W H ^ 0 U ^iJg^ by the pattern with 1 0 exceptions (half of them viral 
ine/ threonine specrtic protein kinases^ ^^^SbC JL have respectively Ser and Arg instead of the 
kinases) and also Epste,n-Barr virus BaM™JJ" kjnase specific pattern described below, 
conserved Lys and which are therefore detected by t J^' n ^s TAC] . X ( 2) . N . [L | V MFYC](3) [D is an active site res- 
[124 4] Consensus pattern: [LIVMFYC]-x4HY]-x-D-^l-l\^FY] [RSTAC] U^l ^ ^ ^ ^ 
[due] ALL tyrosine specific protein k.nases wrth aminogVcoside phosphotransferases [8.9] and 
detected by the pattern. This ^^^^SX^f and evolutionary related to protein kinases, 
herpesviruses gangciclovir k.nases [10] wh.ch are ^ J ent ribon ucleases. Sequence s.m.larities be- 
This profile also detects receptor V^^T^^^Si been noticed before. It also detects Arab.dops.s 
mJ ithese twofamilies and ^J^^^^Z^^^ activity If a protein analyzed includes the 
thaliana kinase- like protein TMKL1 kinase is Cose to 100%. Eukaryotic-type protein 

r 1] Hanks S.K., Hunter T. FASEB J. 9:576-596(1995). 
2 Hunter T. Meth. Enzymol. ^^l 1 *"" 

,s BlBennerS.Nalure329:21-2l(1987). 

BESSSSssaa. 

4 0 [1245] Receptor tyrosine kinase class II stature f e|| surface rec eptors which possess 

A number of growth factors Emulate mrtogene .s by « w* J ^ (RrK)a)| hare tne 

an intrinsic, ligand-sensitive, protein ^^^^^'ttan-mwrtKane region and a cytoplasmic kmase do- 
same topology an extracellular ligand-binding domain, a sing. f dass M rjk's is the insulin receptor, 
^ H^r they can be classified into at« 

45 a heterotetramer of two alpha and two b "£S^^.T£nd binding site, the beta chain transverse* the 
products of a precursor molecule The alpha cha com ns the g J ^ ^ ^ {q „ 
membrane and contains the tyros.ne pro em tanase ^ mamma|s . . lnsulin receptor-related receptor 

- insulin receptor from vertebrates. - Insul.n he jnsulin family . - Insects insulin-like receptors. - 
( ,RR), which is most probably a receptor to>PjM JJ^Mto peptide receptor from Branchiostoma lancedatum. 

50 Molluscan insulin-related peptide(s) "'^ for posLnal information required for the tor- 

- The Drosophila developmental P^'^^ XSlrs SrK! , NTRK2 and NTRK3), which are high affinity 
mation of titeRT photoreceptor cells. Thetr« 

receptors for nerve growth factor and related ""Jg 1 "^^ (Tyro i0, TKT). - A sponge putative receptor 
receptors: - ROS. - LTK (TYK1). - EDDR1 (cak TRKE. ™> J™,^ are kn0W n to exist in the tetrameric 
55 tyrosine kinase. While only the Injihn and the msuhn raw* acto I P^ . p ^. ^ 
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rtpm . rDNH LIVVY-x(3)-Y-Y-R [The second Y is the ^phosphorylation site] 
RE ^XruIcK Rev. L, 5 7 :44,4 7 B(19BB). 

Receptor tyrosine kinase class III signature sur|ace receptors whic h possess 

2/FII-3 [51 - The putative receptor Flt-4 [/\. a s 19 . 
; onacc.servedregior^ 

[12491 Consensus pattern. G-x-H x im iuv j 

n 1 V.^V..U«« A .A m , r ^.^™ (lM B,. 

SSSBsta — - 

Oncogene 8:1233-1240(1993). 

'■■MM 

Consensus pattern. C-x^) lucj « i 
« probably involved in disulfide bonds] 

[ 3] Wicks IP., Wilkinson D., Salvaris h., coy" » 

50 M2521 459. Protein kinase C terminal domain 

M9531 460 Plant thionins signature anima l cells [11 They seem to exert their toxic effect at the 

tothis family. The pattern to detect this family ot pro ^ 



bonds. - 



EP 1 033 405 A2 



+'C: conserved cysteine involved in 



subtilis spore germination prote.n C3 (gene ger^, 
35 noid metabolism [6]. n-D-x(2 4)-D-x(4)-R-R-[GH]- 

s==HS=- — 

45 112571 462 Potato inhibitor I family signature proteinase inhibitors. Members ol this protein 

• ISlBPJIilS 
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^^Sa» ^ 

[1258] 463. (pp binding) group of acyl carrier proteins (ACP) in some mul- 

Phosphopantetheine (or pantetheine ^^^^^JL^ of activated fatty acid and am,no-ac,d 
Lnzyme complexes where it serves as a * ° n *" ese prote ins [2]. ACP proteins or domains have 

grcups [1]. Phosphopantetheine is attached to a fences are oniy provided for recently determined 

been found in various enzyme systems^ W ,ha e JJJ^J^ of long -chain fatty acids from acety.-CoA, 
sequences). - Fatty acid synthetase ( f ^*J^!Sr« are composed of eight separate subumts Minn 
malonyl-CoA and NADPH. Bacterial and P'^^? [°[^f^JlgpQ|yp e ptj(jes. Fungal FAS consists of two multifunctional 
Tptndtothedifferent enzymatic activities ^^^££4 sec tion of FAS2. Vertebrate FAS consists of a 
prateins, FAS1 and FAS2; the ACP domain is ^^^^ tne beta -ketoacyl reductase domain and the C- 
single muttifunctional enzyme; the ff^^^JSZ enzyme systems. Polyketides are secondary me^ 
terminal thioesterase domain [3], - Polyketide antfcohcc » "J™™" * J, js one o1 the polypeptidic components 
S produced from simple fatty acids, ^ »^^^S^1S^Sx«n. cummycln. g«nala«*i. nr-jenjjj 
Lorved in the biosynthesis of £ polyketide synthases pksK, pksL 

oxytetracycline and tetracenomycn a - Bacllus ^subtilis P J J^, 6 . methysa iicy.,c acid synthas (MSAS) 
respectively contain three, five and one ACP dornain ^ ' . . the biosynth esis of a polyketide antibiotic and 

SCnicLm patulum. This is -^^^^^L. «U* * cid ^ciSz^h 
which contains an ACP domain in the C-terminal extremny _ Thjs epzyme cata | y zes he 

complete ACP-like domain. ri IVMFYSTAC]-[GNQ]-[LIVMFYAG]-[DNEKHS]-S- [LIVMST]-{PC- 

45 eine attachment site] 

2 Pugh E.U Wakil S.J. J. Biol. Chem. ^f^fj 9 ^ s Eur J. Bioohem. 198:571-579(1991). 
^ am U.. Z9 n S e„a.^O.,^U..^H.Eu,,B,^. 2 00^0991,. 
[1260] 464. (Prenyltrans) Terpene synthases signal J"f inv0 lvethe highly complex cyclic rearrange- 
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Lot. ■ Hopen. synthase (EC 5 .4 .99, Kae*n>™*» enzyme. are -elated <1 

Ldues and »nich Is located in the c ;'"™^; o r AHU ™|-x-[FY!.x-y-IGA] 

S3 ~ ~ U.S.A. ™««1«C1» 

withanumberoldegenerat^eneurolog^aldiseasessuc^ ed both h norma| and nfected 

spongiform encephalopathy (BSE . PrP ■ encoded " the ho tge ^ ^ ^ & ^ Q . a s , 

cells. It has a tendency to aggregate ^^T^^ repeats of a short motif (PHGGGWGQm mammals, 
peptide, followed by an N-terminal ^J^^SS^i comes a C-terminal hydrophobic domain post- 
PH P NPGY in chicken), ^» |^ side of the cell membrane by a GP.-anchor The 

+ ISigl Tandem repeats I C C Sll +-+ ~7T nattams As signature pattern for PrP, a perfectly conserved 

) involved in the disulfide bond. y 

5 [ 3] PrusinerS.B. Annu. Rev. Microbiol. 43:345-374(1989). 

[1265] 466. CydophiHn-type^ 

Cydophilin [1 ] is the major high-aff imty binding prote, .n ^verteora rotemase). PPIase is an enzyme 

,o (CSA) It exhibits a peptidyl- proryl cis-trans ^^^S^^Si^ imidic peptide ^ h oli 9° peptld r 
!nat accelerates protein folding by ^^^^^J^'a,** on PPIase. Cydophilin is a cytosolic 
[21 It is probable that CSA mediates some of its effect ivj an n i * cydophilin B (or S-cyclophil.n), a 

orotein Which belongs to a family [3,4,5]that also includes the following .sozy y jc ppiase . . Mit0 chon- 

pptse^his^ 

35 dria.matrrxcyclophilin(cyp3).-APPIasewh^eems^ ^ chafacterized in Dros0 ph,la (gene 

protein anchored by a C-terminal — mb ™ e regm This ^pr _ cell cyclo ^ 

ninaA). - Bacteria, periplasmic PPiase f"^^^ 
philin-relatedprotein. This large prote,n(about60Kd),sa^ 

inthefunctionof NKcells. PPiase domain. - Yeast hypothetical protein 

40 porecomplexproteinofSSSKd^ "caenorhabditis elegans hypothetical protein 

-r; T h;:^ 

a conserved region was 

l1266l of d - FK506, are also PPIases, but their seo.uence ,s not at all 



family - , 

related to that of cydophilin 



n^ne.MA.^H.Ho-dS.L.Ze^ 
! SSI E»i Biochem 
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which belong to ditlerenl phyla (ranging Irom lungi to r^^^^J^!^T^S^«> °» 

112691 468. Prolamine P1 signature enromatin during the haploid phase 

Ltamines are small, high* 'JESSES s»,e and inactive complex. 

=h^r g =^ 

bolh the sequence of the mature P2 protein and its propeptide. 
[1273] 470. Proteasome A-type subun « s 

[ The proteasome (or macropain) <ECMSS^ * ^^SUmal proteolytic pathway. In eukaryo es the 
complex that seems to be involved .nan ATP/ub ^"«P^ h , ordered rin g-shaped structure (20S nng) of 

, proteasome is composed of about 28 onThe basis on sequence similarities into two groups, A 

about 700 Kd. Most proteasome subunits can be classrf^ on tne ^ ^ shgre g b f 

and B Subunits that belong to the A-ty P e group are P^^"™ ^ ° ami , are listed be low. - Vertebrate subumts 
" served sequence regions. Subunjs that are known tc beta£o*» ^y ^ ^ ^ ^ 
C2 (nu), C3, C8, C9, iota and zeta. - °^' te P'S* * ^^P2 - Arabidopsisthaliana subunits alpha and PSMM. 

o C5 (PRS3), C7-alpha (Y8) (PRS2), Y7, Y1 3, * *. prote asome is composed of only two djflerent 
.Thermoplasmaacidophiluma^h— ^ conserved regjon was se ,ected, wh,ch ,s 

subunits As a signature pattern for proteasome a iyp 

i0C ated in the N-termina. part «^ Y- [S AD]-x ( 2 H SAGl-. 

[ n Rivett A.J. Biochem. J. Z^; 10 * 1993 ^ . (lMB) 

2 Rivett A.J. Arch. Biochem. Biophys. 268.1-8(1989^ 

3 Goldberg A.L., Rock K.L Nature 357:375-379(1992). 
40 4 Wi.kS. Enzyme Protein 47:187-188(1993 

5 Hilt W., Wolf D.H. Trends B.ochem. Sc. 2 ; 9 o 6 -]°2 ( l 99 ^ 
[6] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244.19-61(1994). 

[1275] Proteasome B-type subunits signature eukaryo tic and archaebacterial multicatalytic proteinase 

4 s ThepJoteasometormacropainiJECS^) P^j|^2^o^ Proteolytic pathway. In euka^s 
complex that seems to be invoked ^J^^^^ J n , ordered ri ng-shaped structure (20S ring) 
the proteasome is composed of about 28 d st ^ J^^ e ba 9 sis on sequence similarities into two groups 

of about 700 Kd. Most proteasome subumts can be class^ on ft. ^ ^ share g „ be f 

A and B. Subunits that belong to the B-ty P e group a »£*^™™^ lt are listed below. - Vertebrate subunrts 
so conserved sequence regions. f^™™%£ ^Tp °^0) .C7-I and MECL-1 . - Yeast PRE1 , PRE2 

ss of these proteins. rGSA14LIVMFVx-[FYLVGAC]-x(2)-[GSACFY]-[LIVMSTACl(3)-[GAC]- 
peptidases [6,E2]. 
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[ 1] Rivett A.J. Biochem. J. 291 q 1-10{1993)_ 
2 Rivett A.J. Arch. Biochem ^^^^ 

toaether the following enzymes [2 to 6]. G ™'°, 648) . Lip0 amide dehydrogenase (EC l&lffl. W " 
109 , ,r Mfi / 51 . Trypanothione reductase (EC IBAb)- u|w sequence around 

and from cyanobacteria which have lle-Arg [7]. 

3 Brown N.L. Trends Biochem. Sc.. f^f^^^. 268:409-425(1989). 

4 Carothers D.J., Pons G Patel M* A^to B,0P Y ggi) 

5 Walsh C.T., Bradley M., Nadeau K. Trends Bicchenv 37 3 : 5-9(1995). 

^^M^^«'^^ dK ^^^^^^\ - TV'S™ a***"*'"' < EC 

[12801 Consensus pattern. S-[LIVMI-yvv] x W i 
[RK1 [K is the pyridoxal-P attachment site] 

11 an, a nuclear GTP-binding protein. £ J^Son of RCC1 with ran probably playsan important 

as a quanine-nucleotide dissociation simulator GDS)[2]. The inte jp fjssion yeast and BJ1 n 

C-Lminal domain of about 130 residues. + — 
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MlDassoM Trends Biochem. Sci. 18:96-101(1993). 
2 Boguski M.S., McCormick F. Nature 366:643-654 (1993). Bleeker -Wagemakers LM.. Bergen A.A. 

[1283 ] 474. RNA 3-termina. ^^Jf^^SS^ conversion o, 3-phosphate to a 2',3-cyc,ic phaj- 
RNA 3-terminal phosphate cyclase (EC H ca f J Z unknown but it is , ikel y to function in some aspects 

phodiester at the end of RNA. The *^J^££*^£^ in three steps: 1 ) adenylate of the enzyme 
o, cellular RNA processing. ^^^SSdS^^^ RNA-3'terminal diphosphate adenylate 3) Re- 
by ATP; 2) the enzyme acts on RNA-3'term.nal phosphate to pro ^ ^ phosphorus in 

,ease of AMP and cyclisation by a ^^ e Z^^T"uL (where there seems to be at least three 
, , he diester linkage. This enzyme, which ha » JJJ^SmS^ widespread. It is found in insects, plants, fungi 
isozymes) and Escherichia coli (gene rtCA), ™*££^ <* <™ 36 to 42 Kd The beSt re9 '° n e 
(gene RTC1 inyeast) and in archeabaCtena n ^ o residues located in the centra, part of the sequence 

, 11 Genschik P Billy E„ Swianiewicz M., Filipowicz W. EMBO J. 16:2955-2967(1 997). 
[ 2 ]SoS w. VincenteO. Math. Enzymol. 181:499-510(1990). 

35 and induce release of the nascent polyp ept.de . Clas I RFs are ^ ^ ^ 

and enhance class I RF activity. In prokaryotes there are twdaeti " UAAand UGA<Jependen t 

feeneprfA) mediates UMandUAG^penden = 

ermination. RF-1 and RF-2 are structural* ^ , °^^,J > m ^ 6 ^ RF (m-RF) which recognizes he 
a famiry that also contains the following prote^ - Fungal MRFI^ _ ^ hypothetical p ro te,n 

Note that prokaryotic-type class I RFs .display mo sign m c|ass |( RFs 

45 to the family of GTP-binding elongation factors nor to eukaryot.c 

[128S] 477. R1O1/ZK632.3/MJ04M tamily signature R|0 , . Caeno * a bditis elegans 

L tolMing u nchara=,e.ized prc-ains are «W ""SSJ^ ~ M0044 4 - Thenaoplaema acid^m 
hypo.ha.ical pro.ein ZK632.3. - M » lte "^^IS s lamily are proleine ol about 55 .0 60 Kd, while 

[1290] [ 1] Bairoch A. Unpublished observations (1997). 
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M2911 478. (RIP) Shiga/ricin ribosomal ^"SS an d ricin family inactivate 60S ribosoma 
act by inhibiting protein synthesis in base from the sugar^hosphate backbone of 

fubun ts by an N glycosidic cleavage which releases a specrf.c adenjn ^ ^ shjge|la dysenter , ae 

28S BNM1 2,3] The toxins which are known to tunc on "J^^ZSt and Jcopies of a B subun* respons^ 
[4] This toxin is composed of one copy of an ^^e^eXceL^oe. - Shiga-like toxins (SLT) are a group of 
or binding the toxin complex to spec.f.c receptors or i the targe ce ^ ^ q{ ^ types o1 these 

S coli toxins very similar in their structure eaSor bean seeds. Ricin consists °< two 9lyc°- 

frSns SLT-1 [5] and SLT-2 [6], is known. - R.c.n. a active The B chain is a lectin wrth a bind.ng 
S chain linked by a disulfide bond. The A chain .s n JJ^^kfic precursor. Ricin is classified as a type- 

LTd ^ 

J Hartley M R, Roberts L.M., Krieg PA. Osborn RW Lord J M EMBO M p Sung L. 

M R. Wltnami Y, Sung-Sil K., Kimura M. *^£^^£*>^ **> M TT^ TbL 
M Holmes R.K., O'Brien AD. J. Bactenol. 170.11 6 i 1122( JJN^ 6] Jackson M.R, Ne.ll H J.. OBnen 
Keusch G T, Mekalanos J.J. Proc. Natl. Acad. Sc . U.S. ^* V y] Barbieri L , Battelli M.G., St,rpe F. B.o- 

ca,box»l-— do,— 

SSJ-X M ^^Ji-0EiSS™T^^ So, U P SA ,997 ;9 4:,709-,7,4. 

nSi K '4^HNApolymera50 bela subunit (RNA pol B) p, okaiyo , e , contain a single RNA polymerase 

« e">Uo<e S ,.herea,e<nr.e«^ 

[1297] Consensus pattern. H-lNbi] n-ivivi] v 
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scenes. Each^^ 

„, tu Z Kutish G.F.. Sussman M.D., Rock D.L Nuc.eic Adds, Res. 21:2940-2940(1993). 
[ 2] McKune K., WoychikN.A. J. Bacterid. 176:4754-4756(1994). 
113001 483 RNA polymerases L / 1 3 to 16 Kd subunits signature transcribing different 

eu« 

sets of genes. Each Cass of RNA P^^^^^^iw of an oligomer, assemblage of 10 to 1 3 
teria, there is generally a single form of ™Apo^^J« jn a|| tnree types 0 , eukaryotic polymer- 

polypeptides. It has been shown that small subumte of^ about 3 0 _ Rpc1 g subunjt {rom RNA 

ases are highly conserved. Subunrts ^ * Worj o *^*'^ rase V [2l . Mammalian RPB11 (gene 
polymerases I and III [1]. - Bud*. g yeas t RPB11 ^"ylhet^ 

POLR2K)fromRNApo,ymerasell. -^ ai ^^^^^ pofymerase subunit L (gene rpoL) [3].As a 
RNA polymerase subunit L (gene rpoL). -™^J*£^ at tne N-terminal extremity of these polymerase 

[ 3] Langer D. EMBL/GenBank: X70805. 

[1302] 484. RNA P«*7^^^ n S!Cl^t RNA pofymerases (EC 277,6) transcribing different 
o in eukaryotes, there are three different forms oV ™*«PJ" H t0 twe|ve different polypeptides. In archaebac- 

sets of gene, Each class of RNA polymerase ^^^J^Lt of an oligomeric assemblage of 10 to 13 
teria, there is generally a single form of RNA P°^ se *™ ° * in of about 8 K d, it is evolutionary related [2] 

thai coM play a role in the binding ot a meal ion. 

[13031 Consensus pallem: [LIVMF1(2)-P-ILIVI.]-X^F1STK^ 

<0 Ml Lange, D H.in J., Thuiiaux P., Z* W. Roc Natl. AoadSci. USA W 



(1995). 

ri3041 485. Ribonuclease HII 
[1] Mian IS; Nucleic Acids Res 1997;25:3187-3189. 



[1305] 486. Ribonuclease PH si 9 n ^ tu ; e e wDKlQcoPH , (1lisaDnosphoro |yticexoribonuclease that removes nucleotide 
Prokaryotic ribonuclease PH ^27^) (RNase PHV PI »^I^e end. of RNA molecules by using nucleoside 
residuesfollowingthe-CCAte^inusoftRNA^^ lt is evolutionary 

267:17153-17158(1992). 

aMJSSSSS-K. Mollis, R^aCeac^CThpooiM.^en-DuirP.^PiCellGio* 



Differ 1995;6:1213-1224. 
[1307] 488. Rhodanese signatures 
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, „^ itzr o p 1 1\ n 21 is an enzyme which catalyzes the transfer of the sulfane 
Rhodanese (thiosulfate '^f^^J^^^^ rhodanese is a mitochondrial enzyme of 
atom of thiosulfate to cyanide, to torn, ^J^***^^^ and cyanide detoxtfication. A cysteine residue 
aboutSOOamino-acidresiduesinvolvedm 

takes part in the catalytic mechan.sm. Some bacterial Escherichia coli sseA [3]. - Saccharopohyspora 
a sulfotransferase actMty. These are: - ^^T^t Xa is a periplasmic protein probably involved in the 
erythraea cysA [4]. - Synechococcus strain PCC 7942 ^^^^J^^. They are based on highly 

1308] Consensus pattern: [FYl-x^J-H^IVJ-P-G-A-^-tUVF] 
Consensus pattern: [FY H DEAF]-G-[SA]-W-x-E-[FYW] 

f 11 Westley J. Meth. Enzymol. 77:285-291(1981). 

[ 2] Weiland K.L., Dooley TP. Biochem. J. 275:227-231(1991). 

[ 3] RuddK.E. Unpublished observations (1993V . llWW - mi()qffl 

in the processingof ribosomal RNA precursors and of ^™™A8. RNwe m » wo^ nr mRNA 
proteins: - Fission yeast pad, a ribonuclease that proba y .^^.'SSS^^dJ. that cleaves eu- 
required for sexual development. - Yeast rib ™ cleas ° F26E4.13. - Paramecium 

bursariachlorella virus 1 prote.n A464R. - ^ and a c . termina , RN ase III domain. - 

hypothetical protein S P AC8A4*8c, a P^^^^^HL ^ as S P AC8A4.08c.These pro- 

[ 1] Nashimoto H., Uchida H. Mol. Gen. Genet. 201:25-29(1985). 
[ 2] Mian I.S. Nucleic Acids Res. 25:3187-3195(1997). 

[1311] 490. Rieske ^-su^ P^ ^" f™ M as the bcl complexor com p,ex I.I) is one of the electron 
Ubiquinol-cytochrome c reductase (EC 1.10.2.^ ) (also Known «» rat _ lu ^ ss the ox j d oreduction of ubiquinol and 

, called the Rieske protein [1 ,2]. The Rieske potent °^^"^^^^^ IV ed regions in Rieske 

conserved regions -re selected as signature MJ-*^ ^ ^ |jgands] [Jhe second c „ 

5 [1312] Consensus pattern. C-[TK]-H-L-L> o ili voi j line hi 

S^;re" d C.p 0 S.X- 1 OSA, l Th e ,i^ 
bond] 

» [ 1] Gatti F.L, Meinhardt S.W., Ohnishi T, Tzago .o« A. 1 

2 Kallas T Spiller S„ Malkin R. Proc. Natl. Acad. Sci. U.S.A. 85.5794-5798(1988). 
[ 3] ^a S Saynovits M.. LinkTA.. Michel H. Structure 4:567-579(1996). 
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[DENSTKQ] 

[1] Nikonov S.V., Nevskaya N., Eliseikina I.A., Fomenkova N R. 

Briand C, Al-Karadaghi S., Svensson LA., Aevarsson A., Liljas A. EI ^^ & J!SS^ 
[ 2] Olvera J., Wool I.G. 9 3 CO:2-°Bioche m Rioohvs. Res. Common 220.954-957(1996). 

across which has only bsen tound so la, in edbactsria. A consan/od ,sg,on locatod ,n the N4amn,nal ssc.,on ol 

served region located in the central section was selected as a signature pattern. 
952-956(1996). 

ESS to SdTi'ly « « i£s rRNA. I. bsis ,o a family 0. ribosonna, profains which, on ,hs basis d sequano. 
similarities [1,2], groups: 

Eubacterial L11. 

- Plant chloroplast L11 (nuclear-encoded). 

- Read algal chloroplast L1 1 . 

- Cyanelle L11. 

- Archaebacterial L11. 

- Mammalian L1 2. 

- Plants L1 2. 

- Yeast L12 (YL15). 

' mail L11 is a protein ol 140 to 165 amino-acid residues. A conserved region located in the C-terminal section of 
nese prote inl was sTcted as a signature pattern. In Escherichia coli, the C-terminal half of L1 1 has been shown [3] 
^JSSrtSSta^ly toklad conformation and is likely to be buried within the ribosomal structure. 
[13221 ConTensus pattern: [R KN]-x- l UV M ]-x-G-[STl-x(2)- [ SNQ]- [ LIVM]-G-x(2)- [ L,VM]-x(0,1)-[DENG] 

[ 1] Pucciarelli G., Remacha ML, Ballesta J.P.G.; Nucleic Acids Res. 1f ^^^{JQOV 
1 2] Otaka E.. Hashimoto T., Mizuta K., Suzuki K .; Protein Seq. Data Anal. 5:301 -313(1993). 
[ 3] Choli T. Biochem. Int. 19:1323-1338(1989). 

5 [1 323] 495. Ribosomal protein L7/L1 2 C-terminal domain 

[1324] [1] Leijonmarck M. Liljas A; J Mol Biol 1987;195:555-579. 

™oT.haa^sm^,ofa 
■o the basis of sequence similarities [1], groups: - Eubacterial L1 3. 

- Plant chloroplast L1 3 (nuclear-encoded). - Red algal chloroplast L1 3. 

- Archaebacterial L1 3. - Mammalian L1 3a (Turn P1 98). - Yeast Rp22 and Rp23. 

» [1326] L1 1 is a protein of 140 to 250 amino-acid residues. As a signature pattern, a conserved region was selected 
[GDN] 
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families consists of: 

L13 - Yeast probable L13 (YM9375.11c). 

< third of these proteins selected. 

. Consensus pattern: lK R]-Y-x(2)-K-[LIVM]-R-[STAl-G-[KR]-G-F-[ST]-L-x-E 

[13 30] [ 1] Cera ,, Woo, I.G. Biochem. Biophys. Res. Commun. 201,02-107(1994). 
5 [1331] 498. Ribosomal protein L14 signature eubacteria, L14 is known to bind 

Ribosomal protein L14 is one of the '^^,,2^^ which, on the basis of sequence simi.ant.es [1], 

Mammalian L23. 

" . CaenorhabdKis elegans L23 (B0336.10). - Higher euka-yotes mitochondrial L14. 
. Yeast mitochondrial Yml38 (gene MRPL38). 

ss half of these proteins was selected. 
. cc^sensuspatte^G^ 

[13321 [ 1] Otaka E., Hashimoto T., Mizuta K.. Suzuki K. Protein Sec, Data Anal. 5:301-313(1993). 
30 [1333] 499. Ribosomal protein L1 5 ^nature Escherichia col., L1 5 is known to bind 

EuVacterial L15. - Plant chloroplast L15 (nuclear-encoded). 
3 s - Archaebacterial L15. - Vertebrate L27a. - Tetrahymena thermophila L29. 
- Fungi L27a(L29,CRP-1,CYH2). 

terminal section of these proteins. 
- . _ pa,,e,n ; M ^ m o^ W U^^HUmrHU m ^omS^ 
A-x(3)-[LIVM]-x(3)-G 

„„ l , l O,a te E,Ha 8tt ™ t oT.. MM aK..S U!Uk iK.P t o M i n Se,D a , ato ,S 3 0,3, W99 3 ) . 
* ruL^au^ralarr^^can b e 9 »up e don*e B as^^a- 6 ^ 
[1], One of these families consists of . 

. Mammalian L15. - Insect L15. - Plant L15. - Yeast YL10 (L13) (Rp15r). 
so - ThermoplasmaacidophilumL15. 

^^^™^«^»^^°^'^™^^ 

in the central section. 
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proteins which, on the basis of sequence similant.es. groups. 
. Yeast mitochondrial YmLB (gene MRPL8). 

section was selected. 

. consensu, panem: l-x.[STHGT]-x(2HKRhx-K-x(6HDEl-x^UMV]-[UVMT>T-x-[STAGHKR] 
One of these families consists of: 

. vertebrate L18 (known as L14 in Xenopus) [1]. - Plant LIB. 

Yeast LIB (Rp28). - Halobacterium marismortui H129. 
- Sulfolobus acidocaldarius H1 29e. 

has been selected as a signature pattern. 
. consensuspattem:^ 
5 r Puder M.. Barnard G.F., Staniunas R.J., Steeie G.D. Jr., Chen L.B. 
Biochim. Biophys. Acta 1216:134-136(1993). 

M3381 503. RibosomalL18p family . ftuial .. Pfl q8Q5 are necessary and sufficient to bind 5S 

to the nucleolus [1]. 
Number of members: 26 
[1] 

Michael WM, Dreyfuss G; 
J Biol Chem 1996;271:11571-11574. 
[1339] 504. Ribosomal protein L19 signature Escherichia coli, L1 9 is known to be 

Ribosomal protein L19 is one of the ^^^^^ in the structure and function of the aminoacyl- 

40 ;r d biSS° b ^ 

Eubacterial L19. 

. Red algal chloroplast L1 9. - Cyanelle L1 9. 



One of these families consists of: 

. Mammalian .ibosoma. pro«in L,9 [1]. ■ ^f^^* ^ 
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One of these families consists p J, «] 

Fission yeast 12. - Halobacterium marismortui HmaL4 (HL6). 
. MethanococcusjannaschiiMJ0177. 

-as baan seiso.sd as a signature pa.tare 

Laie«idsRaa.2i : 3900«00(1a^ 
1 3] Bagni C, Mariottini P., Annas. F., /wald F. 



)7. Ribosomal protein L2 signature , |n Esch . tle hia cc* 12 Is known to bind 

Eubacterial L2. 



M3431 507. Ribosomal protein L2 signature 
Ribosomal protein L2 is one of the proteins rc 

. Paramecium tetraurelia mitochondrial L2. - F.ssion yeasi 
- Yeast YL6. - Vertebrate L8. 

, re Qjon |ocated jn the c-termina. section of these proteins has been selected as 
The best conserved region located in me w 
40 a signature pattern. 

. Consansus pattein: p-« 2) . B . 0 -[STAIV)(a^-[APKl-«-lDE) 

[1344] 508. Ribosomal protein L20 signature Escherichia coli, L20 is known to bind 

" —HS-SS^- 

• 'JUT. ■ — 

has been selected as a signature pattern. 
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One of these families consists of: 

. HalobacteriummarismortuiHL31[4]. 

^S==s«=S=- 

[ 4] Hatakeyama T.. Kimura M. Eur. J. Biocnem. 

5 ri346] 510. Ribosomal protein L21 signature Escherichia coli, L21 is known to bind 

Sorna,proteinL21isoneof^ep^^ 

to the 23S rRNA in the presence of L20. It belongs 

similarities, groups. - Eubactenal L21. 
30 . Marchantiapo.ymorphachlorop.astL21.-Cyanel.eL21. 

. spinach chloroplastL21 (nuclear-encoded). tl91has20 o 

35 pattern. 

[,3471 511.fflr^omalp<otainlJ>2si9natu" mlh . a| „, i ^ alsub0 „iUn Escherichia coli, 1^2 is^own to bind 

TZ--— — — — — 

A ^^---0.,a^ ! ac to .-p^.--^a S as l9 ^a P a,, 
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L23. 

. Algal and plant chloroplast L23. - Archaebacterial L» - Mammalian L23A. 
. caenorhabditis elegans L23A (F55D10.2). - Fungi L25 
. Yeast mitochondrial YmL41 (gene MRPL41 or MBP20). 

[1349] A small conned region in the C-termina. section o, these proteins, which is probably invo,ved in rRNA- 
binding has been selected as a signature pattern [2]. 

. consensus pattern: [RK](2)-[AM]-[IVFYT]-[IV]-[RKT]-L-[STANEQK]-x(7)-[LIVMFT] 

[ 1] El Baradi T.T.A.L, Raue H.A., van de Regt C.H.F., Verbree E.C., 
Planta R.J. EMBO J. 4:210-2107(1985) 

r 91 Raue H A Otaka E„ Suzuki K. J. Mol. Evol. 28:418-426(1989). 
1 3] Pearon K Mason T.L. J. Biol. Chem. 267:5162-5170(1992). 
[ 4] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

^oterS on the basis of sequence similarities, groups: - Eubactena, L24. 

- 5 . Plant chloroplast L24 (nuciear-encoded). - Red I alga. L24. - Vertebrate L26. 
- Yeast L26(YL33).- Archaebacterial HmaL24 (HL1 5) 
. A probable ribosomal protein from Sulfolobus acidocaldanus [1]. 

. Co^uspan^GDENI-D^ 

[ 1] Ouzounis C, Kyrpides N., Sander C. 
35 Nucleic Acids Res. 23:565-570(1 995). 

One of these families consists [1] of: 

40 - Mammalian ribosomal protein L24. 

. Yeast ribosomal protein L30A/B (Rp29) (YL21 ). 

- Kluyveromyces lactis ribosomal protein L30. 

- Arabidopsis thaliana ribosomal protein L24 homolog. 

- Haloarcula marismortui ribosomal protein HL21/HL22. 
45 - Methanococcus jannaschii MJ1201. 

These proteins have 60 to 160 amino-acid residue, The most conserved region, which is located in the N-termina. 
region of these proteins has been selected as a signature pattern. 
so - Consensus pattern: l FY].x-[GSH]-x(2)-[IV]-x-P-G-x-G-x(2 ) - [ FYV]-x-[KRHE].x-D 

[ H Chan Y-L , Olvera J., Woo, I.G. Biochem. Biophys. Res. Commun. 202:1176-1180(1994). 

55 Steins wh^ch, on the basis o, sequence similarities [1 ,2], groups: - Eubactenal L27. 

. Plant chloroplast L27 (nuclear-encoded). - Algal chloroplast L27. 
. Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 
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• u„, M hfilo\w Eub L27 NxxxxxxxxxAlgal L27 

The * — — ^ 01 PrMe ' nS ' 6 " 

Nxxxxxxxxx 

. cons^pat.om: G*lUV M l( 2 )-x-B-aB-O.X(5).G 

, „ a* GA, OP- «— » f—W"* 

21 Otaka E.. Hashimoto T.. Mizuta K. 
SSn Seq. Data Anal. 5:285-300(1993). 

omal subunit 



Yeast L35. 



L29 is a protein of 63 to 1*r£^t5£ol L29 has been selected as a signature pattern. 

-zrzzzz — — 

[KRCQVT]-[LIVMA] 
[ 1] Otaka E„ Hashimoto T, Mizuta K. 

SSl" 1S'SS^ 5 S^ 1 2»- ,uw subunit in Escherichia coli, L3 is known to bind 

algal L3. - Cyanelle L3. 
, . ArchaebacteryHa.obacteriu« 

- ^^£^;^^^^^^ 

w HatakevamaT J. Biol. Chem. 265:3034-3039(1990). 

kur. J. Biochem. 206:373-380(1992). 

f 31 Herwig S. , Kruft V. , Wittmann-Liebold B. 

Eur J B^ochem. 207:877-885(1992). 

fl Otaka E., Hashimoto T„ Mizuta K Suzuk, K. 

Protein Seq. Data Anal. 5:301-313(1993). 
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. Dr0S0 phi.a L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (YL8). 
. Yeast mitochondrial L33. 

oCeins is shown below.Eub. L30 Nxx»«xxxxxC 

Arc L30 N»o«xxxxxxxxxxxxxx)ooc<xxxxxo^^^^^ position Q f the pattern. 

VA]-x(2)-[LMFY]-[IVT] 

>5 ni Mizuta K., Hashimoto T., Otaka E. 
Nucleic Acids Res. 20:1011-1016(1992). 

. con 6e n S u 5 p,«.m.H.p. F .lFVHTIl-x(9)-S-RWx-|KROl 
One of these families consists of. 

. ' Mammalian L31 [1 1- " CMarnydomonas feinhardtii L31 - - Yeast L34. 
- Halobacterium marismortui HL30 [2]. 

. consensus pa«em:V-[KRHL.V^^^^ 
Bbchim Biophys. Acta 1050:56-60(1990). 

[1 359] ' 522. Ribosomal protein L33 signature Escherichia coli, L33 has been shown 

similarities [1,2,3], groups: - Eubactenal L33. 

. Algal and plant chloroplast L33. - Cyane.le L33. 

been selected as a signature pattern. 

50 nl KruftV,KappU.,Wi^ 

2 Sharp P.M. Gene 139:129-130(1994). 
r 31 Otaka E., Hashimoto T., Mizuta K. 
Stein Seq. Data Anal. 5:285-300(1993). 

" 11360] 523. Ribosomal protein L34 signature prokaryotic ribosome. It is a small 
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sequence similarities, groups: - Eubacterial L34. 
. Red algal chloroplast L34. - Cyanelle L34. 

Aconse.ed region that corresponds to the N-termina. ha« of L3 4 has been selected as a signature pattern. 
. consensus pattern: K- l RG]-T- [ FYWL H EQSl-x(5)-[KRHS]-x(4,5)-G-F-x(2)-R 

[ 1] Old I.G., Margarita D., Saint Girons I. 
Nucleic Acids Res. 20:6097-6097(1992). 

One o1 these families consists of: 
amino-acid residues. 

Aconserved region located in the N-termina, section of these proteins has been selected as a signature pattern. 

. Consensus pattern: Y-x-[ST]-x-S-[NY]-x(5)-[KR]-T-P-G 

[ 1] Lan Q., Niu L.L., Fallon A.M. 

Biochim. Biophys. Acta 1218:460-462(1994). 

[ 2] Gao J., Kim S.R., Chung Y.Y., Lee J.M., An G. 

Plant Mol. Biol. 25:761-770(1994). 

3 One of these families consists of: 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E77) 

- Yeast L37A/L37B (Rp47). - Pyrococcus woese, L35A homolog [1]. 

pattern. 

. Consensus pattern: G-K-[LIVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P 

to 

[ 1] Ouzounis C, Kyrpides N., Sander C. 
Nucleic Acids Res. 23:565-570(1995). 

[1364] 526. Ribosomal protein L36 signature Droka ryotic ribosome. It belongs to a family 

Ribosomal protein L36 is the smallest protein .from ^ Eubacterja . L36. - Algal and plant 
4S of ribosomal proteins which, on the basis ^^^^^i^am^u^Wues. Asa-g^ure 

so [1] Otaka E, Hashimoto T., Mizuta K. Protein Seq. Data Anal. 5.285-300(1993). 

A^LfoTe" 
consists of: - Mammalian L36 [1]. 

ss - Drosophila L36 (M(1)1B). - Caenorhabditis elegans L36 (F37C12.4). 
- Candida albicans L39. - Yeast YL39. 

These proteins have 99 to 104 amino acids. 
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A conserved region in the centra, part ol these proteins has been se.ected as a signature pattern. 
. consensus pattern: P-Y-E-[KR]-R-x-[LIVM]-[DE]-[LIVM](2)-[KR] 

r 11 man Y-L Paz V, Olvera J., Wool I.G. 

klochem. Biophys. Res. Commun ,92:849-853(1993). 

One of these families consists of: 

PlantaR.J. Nucleic Acids Res. 13:701-709(1985). 

[ 3] Ramirez C, Louie K.A., Matheson A.T. FEBS Lett. 250.416-418(1989). 

!Sl^ 

Number of members: 27 
[1] 

SLlToroSns of the .arge subunit of bovine mitochondria, ribosomes. 
Piatyszek MA, Denslow ND, O'Brien TW; 



30 L40. 

Chan YL, Suzuki K, Wool IG; 
Biochem Biophys Res Commun 1995;215:682-690. 

35 One of these families consists of: 

- Mammalian L44 [1]. - Trypanosoma brucei L44. 

. Caenorhabditis elegans L44 (C09H1 0.2). - Fungal L44 (L41 ). 

- Halobacterium marismortui LA [2]. 

- Consensus pattern: K-x-[TV]-K-K-x(2)-L-[KR]-x ( 2)-C 

[ 1] Gallagher M.J., Chan Y-L, Lin A., Wool I.G. DNA 7:269-273(1 988). 
[ 21 Bergmann U., Wittmann-Liebold B. 
Biochim. Biophys. Acta 1173:195-200(1993 

so [1 369] 531 . Ribosomal protein L5 signature rihofiQma | subunit In Escherichia coli, L5 is known to be 

basis of sequence similarities [1 ,2,3,4], groups: - Eubactenal L5. 

ss . Algal ch.oroplast L5. - Cyanelle L5 - ^ ■ 

. Tetrahymena thermophila L21 . - Slime mold L5 (V18). - Yeast L16 (39A). 
- Plants mitochondrial L5. 
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. ccaaaasuspanamaUVM,^^ 

, , lH — ^-T. Ha.aKeya.aT. Btochia, Blop^s. 

f 31 Yang D., Gunther I., Matheson A.T., Auer J., SpicKer u., bobcr 

[ 4] Sate E., Hashimoto T., Mizuta K., Suzuki K. Protein Seq. Data Anal. 5.301-31 3(1993). 

o 

ri-V70i 532 ribosomal L5P family C-terminus 

[IS?] ™ S ^i sl ou n daa S oc«ad»».nRibo S omaLL5.Numba,o,rnemb. re .60 

. Algal chloroplast L6. 

- Cyanelle L6. 

20 - Archaebacterial L6. 

- Marchantia polymorpha mitochondrial L6. 
. Yeast mitochondrial YmL6 (gene MRPL6). 

- Mammalian L9. 

- Drosophila L9. 
25 - Plants L9. 

- Yeast L9(YL11). 

30 detect archaebacterial L6 as well as eukaryotic L9. 

[4] Otaka E-. Hashimoto T.. Mtaita K., Suzuki K. P-otoin Seq. Data Anal. 5.301-313(1993). 

" ITL^ka^^ 
One of these families consists of: 

45 - Caenorhabditis elegans ribosomal protein L6 (R1 51 .3). 

- Yeast ribosomal protein YL16A/YL16B. 

- Mesembryanthemum crystallinum ribosomal protein YL1 6-Hke. 

50 These proteins have 175 (yeast) to 287 (mammalian) amino acids. A highly conserved region in the centra, part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x(2)-P-L-R-R-x(4)-[FY]-V-l-A-T-S-x-K 

55 SefofeuCo^^ 
One of these families consists of: 
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i 7 a ,<ii ibf<« 111 - Plant L7A. - Yeast L7A (YL5) (Rp6). 
. Vertebrate L7A (SURF3) pj. riann-f^ 

: 

- Methanococcus jannaschii MJ 1 203. 

1 2] SSubetz D., Burgurh A. Yeas. 7:79-90(1991). 
11378) 536. Ribosomal protein L9 signature Escherichia cot. 1-9 is known 10 bind 

groups: - Eubaoierial L9. - Cyanobactorial L9. 

. Plani chtoroplast L9 (nuclear-encoded). - Red algal chloroplas. IS. 

^^.^^^^^^^^^'^ 
. consensuspaiern^HGlW^^ 

, | ,»« tt «.-. e .---'«.^'-* J -"*"'" ,-, " < "" 1 * 

11379) 537. Ribosomal prolein S10 signaluie Escherichia coll. S10 is known to be 

similarities [1], groups: - Eubactenal S10. 

- Arabidopsis thaliana mitochondrial S1 0 (nuclear encooeo,. 

- Plant S20. - Yeast URP2. 

. consensus pallem: l AV 1 -x ( 3H a DNSRHLI™STAl-x,3)-G-P-lUVM]-x-[UVMl.P-T 

[ 11 Otaka E., Hashimoto T., Mizuta K. 
45 Protein Seq. Data Anal. 5:285-300(1993). 

[1381] 538. Ribosomal protein S11 s-gnature ^ biosynthes is. It is looted on 

sequence similarrties, groups [2]: - Eubactenal 811. 

" ■■ sssess 

. Mammalian, Drosophila, Trypanosoma and plant S14. 
55 - Caenorhabditis elegans S1 4 (F37C1 2.9). 

Oneof the best consented regions in these proteins was selected asasignature pattern. 
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X -[PAHSTCHHDN] 

, „ KM. «» J.. H— ^-X FEBS Ltf, 
[ 21 Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

terialS12. 

. Algal and plant chloroplast S1 2. - Cyanelle S1 1 2 

. Protozoa and plant mitochondrial 81 2 ^ -Yeas S2U ^ ^ ^ ^ cQnserved regions in these 

. consensus pattern: [RK]-x-P-N-S-[AR]-x-B 

[ 11 Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

consists of: -Vertebrate 312 [11- 
5 . Trypanosoma bruceiS12 [2], -Ca e norhabdniselegansSl2(F54E7.2). 
. DrosophilaS12. -YeastS12. 

. consensus pawn: A^KROPl-x-V-L.x( 2 HSAl-x(3HDN^-L 

40 Eubacterial S1 3. 

- Mammalian and plant S1 8. 

. ooo S .„,usp»»«n: pHaSl^-H-x^SN^IVMO.-B-G^ 

r 21 Otaka E„ Hashimoto T„ Mizuta K 
Protein Seq. Data Anal. 5:285-300(1993). 
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. Eubacterial S1 4. 

. Algal and plant chloroplast S1 4. 

. Plant mitochondrial S1 4. 
. Yeast mitochondrial MRP2. 
. Mammalian S29. 

U-s located in thecemerol^ 
[1388] Consensus pattern. [RKj-x^.u ^ *v 

which, on Ihe basis ol sequence similarities 11 .4. gra V 

-. rr^w^— so-*,. 

[ 11 Dang H., Ellis S.R. 
Lucleic Acids Res. 18:6895-6901(1990). 
; r 21 Otaka E., Hashimoto T., Mizuta K- 

Protein Seq. Data Anal. 5:285-300(1993). 

[1390] ^.Ribosomalpro^ 



. Eubacterial S16. 

. Algal and plant chloroplast S16. 

; Eubacterial S17. 
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The 
pattern 



Yeast 31 8a and S18b (RP41; YS12). 
em. 

Census pausrn: G^xWlUVAl^OEKl-x-IRKhP-ILIVl^ 
2 Hsifurth E„ Hi.ano H., WltWann-Lebold B. 

One of these families consists of: 

. vertebrates S17 [1]. - ^^^^S^ fa— MU0245. These proteins have from 63 

tSuTa D J Gene 79:289-298(1989).! 3] Abovich N.. Rosbash M. 
MoL Cell. Biol. 4:1871-1879(1984). 
M397] 547. Ribosomal protein S18 signature , n Escheric hia coli, S18 has been 

' Ssssssssasssr — 

H3981 548. Ribosomal protein S19 signature , n Eschericnia C oli. S19 is known to 

o SLoma. protein S19 is one of the proteins ^ h ^^. S19 belong s to a family of ribosomal P rote>ns 
10 rm a complex with S13 that ^^^^^SSurt. 819. 
which on the basis of sequence similarities [1 A, 9™P 

focated in the C-terminal section of these proteins. 
so . _P^S™^^^ 

! 21 Oteka E-. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
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„ ,,. on. d tnese I-*- «— -* " Mamm *" 5,9 " D,OS< ** 

. ' « ^-^^-^rHsr vs,6,RP55AandRP55B) 

. Aspergillus S16. - Halobactenum mansmortui HS12. 

. Consensus pattern: P«<6 H SANM2H^^ 
" [UEtterA Aboutanos M., Tobler H., Mueller F. 

proteins which, on the basis of sequence sim.lant.es [1 ,zj. 9 P 

25 other in the central section. 

Consensus pattern: [UVM F A W m™^^^^ 

30 [AP] 

1 11 Davis S.C., Tzagoloff A., Ellis S.R. 
* M4011 561. Ribosomal protein S21 «**"•• small ribosomal 5 u„un«. So tar S21 has only Man 

consists of: - Mammalian S21 [1]. 

45 . Caenorhabditis e.egan S 821 W^'!*" 821 * 
. Yeast S21 (Ys25) [3], - Fission yeast S28 [4], 

j^»rN— p--«-p---"--' , *"' -,bi 

M . consensus pattern: L-Y-V-P-R-K-C-SHSAl 

, „ ena, K.S., Morrison S3. Nuo te io Acids ta ! 1«M«Kl«» 

[1405] 553. Ribosomal protein S24e signature 
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One of these families consists of: 

- Halobacterium marismortui HS1 5 [3]. - Metnanococo j 
These proteins have 101 to 148 amino ac.ds. 

, „ B,o»n S.J., Ml A., mi C*. DJ. 0» Ql-M-P"* 
[ 21 Sosa L. Fonzi W.A., Sypherd PS. 

consists of . - Mammalian S26 [1]. 
, . octopus S26 [2]. -Drosophi,aS26(DS31) [3]. -Plant cytoplasmic S26. 
- Fungi S26 [4]. 

. consensus pattern: [YH]-C-V-S-C-A-I-H 

H. Gene 150:401-402(1994). 

35 One of these families consists of: 

. Mammalian S28 [1]. - Plant S28 [2]. - Fungi S33 [3], 
. Methanococcus jannaschn MJ1 202. 

pattern. 

. consensus pattern: E-[ST]-E-R-E-A-R-x-L 

One of these families consists of: 

(F56F3.5). Q (pLC1 and PLC2). 
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. Consensus pattern: [LIV]-x-[GH]-R-[iV]-x-E-x-[SC]-L-x-D-L 

[ i] Liu J.H., Reid D.M. 

Plant Physiol. 109:338-338(1995). 

ri4io] 557. Ribosomal protein S3 signature . . jt jn EsC herichia coli, S3 is known to be 

similarities [1], groups: - Eubactenal S3. 

- Algal and plant chloroplast S3. - Cyanelle S3. - Archaebacterial S3. 

x(2)-G-x(2)-G 

[ 11 Otaka E., Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

and plant chloroplast S4. 

Cvanelle S4 - Archaebacterial 54. - Mammalian 89. - Yeast YS11 (SUP45V 

40 x-[LIVMF](2) 

, „ Mizuta K., SuzuK K.I., Olaka E. ^il^Sr 80391 '' 

« Kriszawska A. Md Call. Biol. 12:402-412(1992). 
One of these families consists of: 

. ;rr;X~*as,S7,VS6,.-A-c«ac, S Ha,S4a 
positions in this region are positively charged residues. 
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. Consansuspansm: H-X.K-R-[LI«F,-lSANKl.»-P.X(2HWVWLIVMl.x.(K R P] 

, „ ^ E M., Bae,-Hom„o P. B~» L.3.. R*» «.. — «• «" ««• ^ 

HP, ™" « H ■ U.K. **l «a ,2,3:435.330*4,. 

[1,2], groups: - Eubacterial S5. 

- Cyanelle S5. - Red algal chloroplast S5. - Arch aebacte rial S5. 

Mammalian S2 (LLrep3). - Caenorhabditis elegans S2 (C49H3.11) 
'. S"s2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondna. S5. 

■^n gCine -idues, and located in the N-termina. section of these prote.ns. 

. consensus pattern: G,KRQ,-x(3)-^^ 
(5,6)-[DEQ]-[LIVMA]-x(2)-A-[LIVMF] 

LrSS^SSraiS^.. — T. — K. P,<xe,n Se, Da* Ana,. S:2e 5 -300 ( ,«. 
ri4l41 561 . Ribosomal protein S6 signature Escherichia coli, S6 is known to bind 

similarities, groups: - Eubacterial S6. - Red algal chloroplast 56. 
i - Cyanelle S6. 

located in the N-terminal section o1 these proteins. 
, - Consensus pattern: G-x-[KRC H DENQRH]-L-[SAl-Y-x-l-[KRNSA] 

Sefofe"^ 
One of these families consists of: 

. Mammalian S6 [1]. - Drosophila S6 [21- . Plant S6 m -YaasU ^ ^ 
the selective translation of particular classes of mRNA. 

. consensus panem: [UVMHSTAMRl-e-G.x-D-x(2)-G-x.P-M 

, „ Franco R « — SFSrSS p" aTLd. Sci. U.S.A. >»1«0<» 
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directly to part of the 3'end o. 168 ribosomal RNA. It belongs to a famity of ribosoma. proteins which, on the basis of 
sequence similarities [1,2.3], groups: - Eubactenal S7. 

- Algal and plant chloroplast S7. - Cyanelle S7. - Archaebacterial S7. 
s - Plant mitochondrial S7. - Mammalian S5. - Plant S5. 

- Caenorhabditis elegans S5 (T05E11 .1). 

,0 . Co— pattern: ^sm^^m^m^*^>^^ C ^ 
[STAC] 

t „ Klusamann S., Ptaoka P., Bargmann U.. Koatk. S., Win-nann-Liaoold B. Bio,. Cham. Hoppa-Bayta, 374: 

20 these families consists of : 

- Mammalian S7. 

- Xenopus SB. 

- Insect S7. 

25 - Yeast probable ribosomal protein S7 (N221 2). 

- Fission yeast probable ribosomal protein S7 (SpAC1 8G6. 1 3c). 

These proteins have about 200 amino acids. A high.y conserved ^*U^^*»^**"~** 
sectioned which is rich in charged ^^^-jST' - pMBm ' 

35 [1], groups: - Eubacterial S8. - Algal and plant chloroplast SB. 

- Cyanelle SB. - Archaebacterial SB. - Marchantia polymorpha mitochondrial SB. 

- Mammalian S15A. - Plant 81 5A. - Yeast S22 (S24). 

40 T hebestconsen,edregion.c^ 

- Consensus pattern: l GE]-x(2)-[LIV](2HSTY]-[ST]-x(2)-G-[UVM](2)-x(4).[AG]-[KRHAY l ] 

[ 1] Otaka E., Hashimoto T., Mizuta K. 
45 Protein Seq. Data Anal. 5:285-300(1 993). 

[1]. One of these families consists of: 
so - Mammalian SB. - Caenorhabditis elegans SB (F42C5.8). - Leishmania major SB. 

- Plant SB. - Yeast SB (S1 4) (Rp1 9). - Archebactenal S8e. 

55 selected as a signature pattern. 

. Consensus pattern: t KR]-x(2)-[ST]-G-[GA]-x(5HHR]-[KG]-[KR]-x-K-x-E-[LM]-G 
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[ 1] Engemann S.. Herfurth E., Briesemeister U., Wittmann-Liebold B. 
j Protein Chem. 14:189-195(1995). 

. Cyanelle S9. - Archaebacterial S9. - Mammalian S16. - Plant S16. 
. Yeast mitochondrial ribosomal S9. 

A conserved region containing many charged residues and located in the centra, section of these proteins has been 
selected as a signature pattern. 

. consensus pattern: G-G.G-x(2)-[GSA]-Q-x(2)- [ SA]-x(3 H GSA].x-[GSTAV H KR]-[GSAL]- [ L.F] 

[ 1] Chan Y.-L, Paz V, Orvera J., Wool I.G. FEBS Lett. 263:85-88(1990). 
[ 2] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[142 4] 568. R^ulose-phosphate 3-epimer^ ^ q( ppE) „ ep _ 

The sequence of PPE is highly related to: 

- Escherichia coli D-allulose-6-phosphate 3-epimerase (gene alsE). 

- Escherichia coli protein sgcE. 

- Mycoplasma genitalium hypothetical protein MG112. 

been selected as signature patterns. 

. Consensus pattern: [LIVMPl-H-lLIVMPYl-D-tLlVMl-x-D-xO^ ^JW^"*^ 
5 - Consensus pattern: [ LIVMA]-x-[LIVM]-M-[STHVS]-x-P-x(3)-G-Q-x.F-x(6)-[NKHLIVMC] 

[ 1] Kusian B., Yoo J.G., Bednarski R., Bowien B. 

^569 (ST^IfmLty to lectin domain of ricin beta-chain, 3 copies. 
» 426 TNs family consisti of a triplicated domain involved in cell agglutrntjon ,n nan. 

l [1427 i 570. (Rotamase) PP^^ « accelerates protein folding 

Peptidyl-prolyl cis-trans .somerase EC 5.2^8) , (Pm ^^^^^ {n Mo st characterized PPiases 
by catalyzingthecis-trans isomenzation of proline nmidic ipeptide bonds in ongopep j Recent| 

^^rr^s^s ^^^^^^^^ of ^s taU is ta 

45 a third family has been discovered 2,3] _So far the only bioc » ^ fers amino acid resi . 

is evolutionary related to a number of proteins that are also probably PPiases: 

- Drosophila protein dodo (gene dod). - Mammalian protein PIN1 , 
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. Campylobacter jejuni cell binding factor 2 (CBF2), a secreted antigen. 
. Bacillus subtilis hypothetical protein yacD. 

- Helicobacter pylori hypothetical protein HP0175. 

- A hypothetical slime mold protein. 

A — -*.*- — asa^-^a^haca^ — n» « cn^es 
been selected as a signature pattern. 

[ 1] Fischer G., Schmid F X. 
Riorhemistrv 29 2205-221 2(1 990). 

I TSudd K e! Sofia H.J., Kconin E.V., Plunkett G. III. Lazar S„ 
Rouviere PE Trends Biochem. Sci. 20:14-15(1995). 
E ZteWJ,U° Ruecknage. K.P., Schelbert B., Ludwig B., Hacker J., 
Linn K., Fischer G. FEBS Lett. 352:180-184(1994). 

found [1 ,2] to be evolutionary related. These enzymes are: 

, . irh in tne bioqenesis of ribosomes by catalyzing the dimeth- 

nosines in the 3-end of 1 8S rRNA. resjsta nce to macrolide-lincosamide-streptogramin B (MLS) 

in a reduced affinity between ribosomes and the MLS antibiotics. 
. . Caenorhabditis elegans hypothetical protein E02H1 . 1 . 

probably involved in S-adenosyl methionine (SAM) binding. 

[STAGV]-[LlVMFYHC]-E-x-D 
^vanGem^.,^^^^^ 

[14 29] 572. (RuBisCO small) Ribulose bisphosphate carboxylase, small chain. 206 members 
[14301 573. ATP/GTP-binding site motif A (P-loop) (ras) ,2,3,4,5,6] that an appreciable 

PromUuence comparisons and c^ 
45 proportion of proteins that bind ATP or OTP share ^numw* o faetween & beta . strand and an 

Unserved of these motifs is a This sequence motif is generally 

alpha-helix. This loop interacts wrth one , of he p ho p W rous ATp . or GTP . bindin g proteins ,n 

referred to as the 'A' consensus sequence [ ] or he P -tap W ' e Qf guch a motjf has 

which the P-loop is found. A number of prote.n family 'or wjijch the etev^. chains. - Kinesin heavy chains 

so been noted are listed below: - ATP synthase alpha ^^f^2Sn-. - Thymidine kinase (- Thymidylate 
and kinesin-like proteins. - Dvnamms ^^^^^^^^^jp^mg proteins involved in 'active trans- 
kinase. - Shikimate kinase. - ™ J °0 G^P-b^g elongation factors (EF-Tu, EF-1 alpha, 

port' (ABC transporters) [7] - DNA and RNA hehcases 8 9 10]_ GTF bind g g _ prQtejn ^ 

EF-G, EF-2, etc.). - Ras family of GTP-bind,ng ^^^^^ - Bacteria, recF protein - Guanine 
55 -ADP-ribosylation factors fam-^^ 

nucleotide-bindingproteinsaphasubunrts ^^^JSl. are picked-up by this motif. A number of 
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found instead of Ser or Thr. 

Consensus pattern: also found in a number of other proteins. Most of these proteins 

Glynia S MJ.,Me.rickW.C.Proc^all.tod&U.^84jei4i»w 2i £, 1 , 6s . ll7 .„ l! »3).[7) HigglaaC.F., Hyde 
Donchenko A.P., Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 

[1431] GTP-binding nuclear protein ran sl 9 natur ^; aS) hvdro , yZ es GTP and which has been implicated in a 

Ran (or TC4) is a small abundant nuclear P^^J^'g^,..,., processing and export andcel. cycle 
, ar ge number of processes includmg nuch«^eptam^^ ^ GT P-binding proteins [3], but it is 

checkpoint control [1,2]. Ran is generate in that it , acks cysteine residues at its 
only slightly related to the other RAS p ^^^J^ P M acidic C-terminus. It is, however similar to 
C- terminal and is therefore not subject to 

RAS family members in requiring a 'P^^^"^^^^!^^^^^^^* GTP-binding B motif which, in ran, is perfectly 
protein (GAP) as stimulators of overall GTPase actrvity. J^on ofthe ^ E . K . lL F]-G-G-L.R-[DE]-G-Y-Y- 

,- conserved has been selected as a signature P at,ern ^^ w (P .^ p) . 

Proteins belonging to this 

30:4637-4648(1991). 

o [1432] 574. recA signature hnmn innous recombination and recombinational repair of DNA 

The bacterial recA protein [1 A^J^^^l. and iXstranded DNA. itbinds and hydrolyzes 
damage. RecA has many act.vrt.es. |t fl ament * ^ to s^gle a ^ |eadjng ^ fts autocataVtlc 

ATP, it is also a recombinase and, finally, it "teracte ^i *a c u g cons erved [3,4,5,H] among 

cleavage. RecA is a protein of about 350 * ^ onse ld region, a nonapeptide located 

£—7^^^ j , Sharp RM . J. Mo.. Evol. 37:399-407(1993). 

RecA.html 

found N-terminalComment: to a DNA binding effector domain. 
[11 Pao GM, Saier MH; J Mol Evol 1995;40:136-154. 

[1434] 576. Ribonucleotide reductase la rg ■ synthesis of de oxyribonucleotides from their 

these regtons has been selected as a ^ 

55 ^s^Z^:^X B - L - ~ Trans - 16:91 " 94(1988)[ 21 Reichard 

260:1773-1777(1993). 



214 



EP 1 033 405 A2 



. •„. m i that cleave the 3'-5' internucleotide linkage of RNA via a 
structural and functional* related 30 W^^^* other RNAses nave been found to be evo- 
nucleotide ^-cyclic phosphate ^ med ^ e ^^^l ^ flowering plants is often controlled by a s.ngle 
.utionary related to these fungal enzymes. '^^^bL by self-pollen or by pollen bearing either of the 
gene (S gene) that has several higher plants of the solanaceae 

?wo S- alleles expressed in the style. The ^I^SSdSX^ induced RNAses LE and LX from tomato 
family has been shown [2,3] to be a nbonuclease. - P h «J^ rescue system . . Escherichia coli periplasm* 
[4,. These two enzymes are probabl, ' , ™^^^3^J^^imlc RNM - Haomophilu. Wluonao hypo- 
RNAse I (EC 3.1.27.6 ) (gene rna) [5]. - Aeromonas "W^P^ J ■ Wed in the ca talytic mechanism of 

Zeal pSehHi0KB.T*o histidines residues have in all the sequence described 

SSbKSwss^---— ■■— » 

B.A., hiring V., Ebert P.R., Anderson MA. Simpson RJ Sakiya , q ^ 95;1 . 7(1990 ^ 6] 

effler A., Glund K., Irie M Eur. J. ^7 -6 3 ^^ 1 ^ ;255 . 262(1 ggoy^ 7] Kurihara H., Mitsu, Y, Ohg, 

[1 437] 578. Ribonucleotide reductase la ge sub U f n ^ , 9 n ^ r ^ D Donding ribon ucleotides. It provides the precursors 
Isthereductivesynthesisofdeoxyribonuceot^ 

necessary for DNA synthesis. ^^^^^J^L regions of similarities in the sequence of the large 
5 1000 residues) and a small subunrt (900 to ^"^^JJiJL been developed as a signature pattern, 
chain from prokaryotes, eukaryotes and ; flfuae •■^5^^. x( ^ [ST ACHJVMlHAS^x(2HPAh 

Science 260:1773-1777(1993). 

'^r'^SSSS^ P*^£;T^SnSdRNA con** on. o, mo™ copi.s of a 

Heterogeneous nuclear ribonucleoprote.ns - hnRNP A1 (helix ^ae a ^ ribonucleoprotei ns 

hnRNP C (C1/C2) (once). - hnRNP E E (UP ) (a east, ^{J^ £ p^rnA and mRNA associated proteins 
- - U1 snRNP 70 Kd (once). - U1 WP A (once - U *HNP B ( ) ^ ^ ^ {Q rjbosomes ( e 
« - Protein synthesis initiation factor 4B (elF-4B [3 a prote in. b . Yeast protein NS R1 

40 - Nucleolin (4 times). - Yeast •h^f"*"™ TJ^SSS^St localization sequences. - Poly(A) binding 
(twice). NSR1 is involved in pre-rRNA ^^^^S^L protein Sex-lethal (Sx.) (twice). - Drosoph a sex 
protein (PABP) (4 times). » Others ^^^^^L 'elav protein (3 times), which is probably involved I n 
'determination protein Transformer-2 ^^^^^^^ ant i g en HuD (3 times) [4], which ,s highly 
the RNA metabolism of neurons. - Human P araneopl ^.! n ^; Drocessing - Drosophila 'bicoid' protein (once) [5], a 

4S simL to elav and which may play a role in ^\ on ^^^m^ La antigen (once), a protein which may 
segment-polarity homeobox protein that ma> ,also ^^^^^^9^^^^ 
play a role in the transcription of RNA P 0 ^^ n ^Z^ sUess, which seems to be a RNA-bind,ng protein. - 
- A maize protein induced by abscsic acid ,n 

Three tobacco proteins, located ^^^^^IS^^SZ in RNA processing in re.af.on with cellular pro- 
so RNAs (twice). - X1 6 [7], a mammalian prawn whu* ^ m rat (twice) . . Nu cleolysins TIA-1 and 

Iteration and/or maturation. - 'n"****^ lymphocyte target cells, may be involved ,n 

TIAR (3 times) [8] which possesses nucleolytic aftvdy J JJJ^g? a nd ; orp o, y .(A) tail length [9].lnside the putatwe 
apoptosis. - Yeast RNA1 5 protein, which plays a ^ ^^STnie L one is a hydrophobic segment of six 
^-binding domain there arc ^^^J^^^S^ motif (which is called RNP-1 or RNP- 
55 residues (which is called the ^f™^^ representation: 

CS). The P-^™ * RNP " 2 
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(1992) 1 91 M nvielle-Sebastia l. , vv w 
443 581. Hubwdadn signature ns They contai n an iron atom wh,ch . hgoted 

as gastricsin).- Vertebrate cnymos.nv ; _ Mamma | ian re mn (EC 3 4.23,1 5 ) wnos 34.23.18), 
GTA] ID is the active site residue -Ml«" ^Jj.W., Wlodawer A. Biochem.st^ 30.4663 4671( 
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The S1 domain occurs in a «* St domain has an OB** structure. 

6 „ock orotein ^^STS'SU AG; Call 1997;88:235-2«. 

nMsTsas SAICAR synthetase signatures , S AlCARsynthetase) catalyzes the sev- 

'me thataiso catatyze *«%*Z*%ZZ££Ll U™ *» S«« B ^etase. 
; centtaisec^otm*^ 

Consensus pattern. [UVMi- w r iL j ' FY1 . E . F . G 

Consensus pattern: Bjo , 42 .. 2 59-287 (1 992). 

££3 589 Src homology 3 (SH3) doma^ profi^ 60 amjno . acjd resid first Jd^ed as a 

'-■MM 
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,„ t „ru nfl* subunit - Mammalian Bas GTPase- 

protein (gene DlgD, mammalian tight '"^g ^ mamma lian synaptic prote>ns SAP , ine kinases: 

i-tvpe ca cium channel beta ^egu.ai madPH oxidase: p47 (NCF-1 ), pb/ ^or r 

Some myosin heavy chains iron _ Yeast actin-bmding protein ABKi tb BEM1 . bindingpro teins 

* rtttrtf *•) N A BiocMm . soph,*. Ma ,204:75*3(1994). 

45 Consensus pattern. 1^1 x ^ bjndj motil (P -|oop). 

family also contain a copy of the " complexes. Mem- 



P465] 595. Staphylococci nuclease homology (Snase) 
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^cP.P-*-i-n^«^ , -^*'^ J, " r '* ,:, "' m 

',,«<, 596. SPRY *rn*A Ryanodh . „ SMplor . Do^ln d unKno*. *«*"■ «— 

by the FDFT1 gene. SQS seems to be membran n e . b °^ SD hate (GGPP) into phytoene. It is the second step m the 
? he conversion of two molecules of ^f^^X^Lo, earned out by PSY is catalyzed in two separ^e 
biosynthesis of carotenoid^^^^ 
o s eps the first is a head-to-head condensation o the -two ' "J*^ jsms tnat synthes ize caroteno,dsj>lants 

^mediate is then rearranged to form ph*oene and fung , ,„ bacteria PSN Ms encoded by 

and photosynthetic bacteria as well as some , non . pnowy from ^ descnptlon above both SQS 

are localized in the central part of . tL IVAT]-llVl-G-x(2)-[LMSCl- x(2)-[LIV] 

Consensus pattern: Y-[CSAM]«^ 

Consensus pattern. [LIVM]-G-x(3)^-x(2 3)-N- IF] x R D [UV Robinson G.W., Tsay Y.H., K.enzle BJC Sm.th 

30 [llSumme-C KarstP 

Monroy C.A., Bishop R.W. MoL Cell, hh ' 414 . 1421(19 9 3) . 

^ M '^^P^^^ 9 ^ Si9n f r Lt mediates targeting and insertion of the signal 
Vh"ecTg^^^ 

« IL.iance of exported proteins into the membrane of the enaoPiasrr dj tejn that interacts wrth the 

3 P Ste su nZ One of these subunKs, the 54 ^X^Z^^oi IrP54 include the GTP-binding 
s anal sequence when rt emerges from the nbosome. Tte N s» * ^ be)ow p , . Escne nch ,a 

ITg domain)andareevolutionary ^ to ^2T2^K prokaryotic counterpart of SRP54. Rh is 

45 seria gonorrhoeae which seems to be the hom °^°' n ^ P Qmain is als0 a t the C- termmus. - Bacterial flageter 
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, h „ „ n< . ist of a core composed of a catalytic 
stricted ln ** rtw t to *J^! re listed be |ow. - Mammalian P^phatase-X (PP >Q. ai a js similar to PP2A, but 

^» 

, , „ o-» P. Ann, Rev. M- » -J* 1 ™, L FEBS L«. 268:355.359,1,90). 
re ,ated to hypothetical prote,ns ^JjJ^L been selected as a signature pattern. 
Serine and threonine de ydratas* 1 . Z] deny dratase ( EC f^ilfl «f ^ J * catalyzes the 

■ ss= — rrrrr: 

attachment site] Na akashima H., Nose K., Matsuda Y ., P^ C cad .'s c i U.S.A. 

Consensus pattern. K-x-E x(3) L W] I Tahara 
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m ent: as well as S-receptor kinases are >n ^^^^^ system: synonymous and nonsyn- 
[1] Evolutionary aspects of the S-relatec I genes of the Brassy self -ncomp iy y ig95;140;1099 . 1104 
onymous base substitutions. H.nata K, Watanabe M Yamakawa S, san 9 ^ ^ 



14^^ 

Succinate dehydrogenase (SDH) is a me ^^^ ^^^^^Q^Q^y^op^obic component composed 

ponent composed of an FAD-binding a'mono heme transmembrane 

of a cytochrome B and a membrane anchor prote.n^ Jhe c^nrome h _ Cytochrome 

b560 from the mammalian m.tochondr al S ° H t Com P' e ^ rn ^ CX Cvtochrome b from yeas t mitochondrial SDH complex 

Laeneua pattern; W-|U« S « « « l= I 

SSSX* 9 ,!-^.".. KloaregB. , Mc4 « a50:4e4-495 ( ,99 5) . 
KfSL S^S^Ll o« P-».*s MM Ni synapt* — «- an. genera, secret. Halachm, N, 
Lev Z; J Neurochem 1996;66:889-897. 
Number of members: 40 

(1476) 605. Protein secE/sec6l I^T^T „ „ one ol „. components - with secY and aacA - ol lha 
la Bacteria, the secE prolan playa a role in protein .gamma playsa role in proteintranalocation 

p..pro.ein,rans,ocaa,.lne Uk a,^ 
, through the andoplaamie rfflculum; it a part ol «^<£ acids that oootain a aingl. transmembrane region 

^.^^^^ 

' rr 8 r^et"fSwTO 

!T»Sn E., SommerT. Prahn S., Goerlich P., Jenls* S.. RapcodT.A Halure ******* 
[1477] 606. 11 -S plant seed storage proteins s '9 na,ur a , e to( lhe deve ioping plant, 

o Plantseedstomgeprot.ins,whos.p.rnc,pa^ 

can be classified on th. basis ol their structure into I, a „ acidic and a basic chain 

Basic-subunit » < About-480 ,,-siamilyare pea and broad bean legumina, rape 

sullid. bond." posilion ot the pattern. Pro ems UWq> »» " ™» % e g ^ ulin , ^ gtob ulin. sunllcer hell- 
crucilarln. ric, glulellns, 2£SSIl!*tl-> •» ae* and basic subuni.s (Ash- 

so 

55 SS3 7? 9 ~rSr^ storage proteins o, mcs, angiosparma and gymnosperms. The 7 5 alorage 



proteins are homotrimers. 
Number of members: 67 
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son A; Plant Physiol 1993,101 :729-744. 

[14 81] 608.Aspartate-s e mialdehyde dehyd^ 

irom Asp to diaminopimelate and Lys, to I • and ^ ^^^7^^ a protein of about 40 Kd (340 to 370 
Slh a conserved cysteine as well as an hlslidine has been used .nstead 

(EC 12^38) (AGPR) [1,2] .s the enzyme that ^JJiS^te wS N-acetylglutamate 5-semialdehyde.ln bacteria 

rr S aZ=^ 

around m res«ue is we,, curved and car , b. , use m£g££~ ^ 

fSTSKTESES^ Ge8Sert Kim JK ' 

Nargang F.E.. Waiss R.L J. Biol. Chem. 269:8,89«03(,994). 



'» ■ ■ — ■ .. 

[1483] 609. Sialyltransf erase family, 
Number of members: 18 



I1UIMUOI V^l M.w. 

M4841 610. SpoU rRNA Methylase family 
This family of proteins probably use S-AdoMet. Number of member^ 58 ^ 
[1 ] SpoU protein of Escherichia coli belongs to a new amily of putative r «NA methyiases ^ 
Acid S P Res P l993;21:5519-5519. MThe^ ? ned«h^^.^^^^ ^ Res ig97;25; 
for tRNA (Gm18) 2 ' methyltransferase activity. Persson BC, Jager e. fcusiaisso 



4093-4097 



[1485] 611 Stath ^ ,n ; am, 'V r h l s which means relay) is an ubiquitous intracellular protein, present in a variety 
Stathmin [1] (from the Greek ' st f ^^'^^"^tr diverse second messenger pathways. Its expression and 
of phosphorylated forms and wh,ch serves as a to extrac iiar signals regulating cell prolif- 

phosphorylation are Structurally, it 

eration, differentiation and I function 1 78 residue alpha-helical domain consisting of a 
consists of an N-term.nal domain of about 45 residues , fo ' owea ^ SCG1Q js a neuron -specific, mem- 

heptad repeat coiled coil structure and a C-term,nal ^^S^Q^^It IshlghV-lmilarln itssequence 
brane-associated protein that accumulates r ithe growth ' f | 2S^£^2i 32Lkiu«i «hioh i. piobably 
tostathmin, but differs in that it stains an additional N-term^ah^ 

responsive for rts interaction with membrane. ^ P^^^^^W, ends with the first three 

have been selected as signatures for proteins of the stathmin family. 
Consensus pattern: P-[KRQ]-[KR](2)-[DE]-x-S-L-[EG]-E- 

.CS ^^S%S^, 2] Maucuer A, Moreau ,, Mechali M., Sobe. A. , Bio,. Chem. 

268:16420-16429(1993). ThMnllowina uncharacterized proteins have been shown [1] to share 

[14 86] 612.SUA5/yciO/yrdCfam,^ 

regionsof similarities: -Yeas protein ^^^^^^^ yrd c and HI0656, the corresponding Haemo- 

rMB^'^Ob^n^s^pattern: [LIVMTA](3HUVMFYCHPGHHDE]^STA]-X^FYHGAHUVMHGS]- 
[l«q I Tsaircch A, Rudd K.E.. Robison K. Unpublished observations (1995). 
[1489] 613. Sucrose synthase 
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.u ■ < o,^r« fi fmm UDP-qlucose and fructose. This family includes the bulk of 



transferase family Gjycgs transf_1 
[1490] 61 4. Sulf transferase proteins 
Number of members: 59 



IMUmoer Ol meiiiuaia. 

[1491] 61 5. Synaptophysin / synaptoporin signature membrane of synaptic vesicles, which 

Synaptophysinandsynaptopo^ 

may function as ionic or solute chanrufe J^J^^JK^a. a signature pattern for this family of proteins, a 
N- and C-termini sequences seem to be fytcptemca ^^ s ~ ^ jurt ^ erthefirattlOTSme mbranedomain 
highly conserved region <°^ d ^ 

Scherer H.. Betz H. Neuron 5:453-462(1990). 

[1492] 616. Syndecans signature transmembrane heparan sulfate proteogly- 

Syndecans [1 ,2] (from the greek syndem; to components and growth factors. Syndecans bind a 

cans which are implicated in the binding oU xtracel.ular J^^J^, Qf a 9 s co , eceptors . structurally, these 
variety of molecules via their heparan sulfate d»>^ ^extracellular domain (ectodomain) of variable 
proteins consist of four separate ^f^^^Z^Z forms of syndecans. The ectodomain contains 
,ength and whose sequence is not evolutionary ^^^^^ c) a transmembrane region; d) A highly 
the sites of attachment of the heparan ^ 

conserved cytoplasmic domain of about 30 to 35 residues .which cou _ * 3 of neuroglycan or N . 

known to belong to this family are - 8*^^.^ . caenorhabditis elegans probable syn- 
syndecan. - Syndecan 4 or amphiglycan l^Z^ZSo^Z syndecans starts with the last residue of the 

basic residues, could act as a stop transfer site. 

SIKSS Sp* 9 ,. O* R.L.. Lose E.J. Ann. Pev. C„ B». .36,333 
(1992) [2] David G. FASEB J. 7:1023-1030(1993). 

[1493] 617. Syntaxin / epimorphin famify ^nature _ jn 2) _ a mammalian 

The following proteins have been shown to ^^T^^^L^ . syntax n 1 A (also known as antigen 
mesenchymal protein which plays an essentia ^^^jS3^d3anB of synaptic vesicles at presy- 
HPC-1 ) and syntaxin 1 B wh ich are synaptic proteins jp V0 ^Qj^n docking of synaptic vesicles at presynaptic 

naptic active zones. - Syntaxin 3. - Syntaxin 4, "^^^^^^^ . s^tLin 6, which is invoked in 
active zones. - Syntaxin 5, which , medate. ^^SS^i^ the transport of proteases 
intracellular vesicle trafficking. - Syntaxin 7. - ^ ^12 (or ^ VP86) whu* q _ ^ ^ 

tothe vacuole. -Yeast SED5 which ,s requiredforthefu^ 
andSS02vvhicharerequi.edforves« 

assembty. - Arabidops, thaliana pro*. KNO ^ ^mj £ ^ share the following character- 

hypothetical proteins F35C8.4, F48F7.2. F55A an ° hydrophobic and isprobably involved 

istics: a size ranging f rom30 Kd to 40 Kd; a ^^T^^^^^J^ to be in a coiled^, confor- 
in anchoring the protein to the membrane; a c^tra', weH °"^ d ^ Wh thQ coj|ed coil domain . 
^.Thepa^^^ 

73 425-426(1993). 

and Sm2, separated by fMj K Hom , 9 H , Brahms H, Luhrmann R EMBO J 1995:14. 
1999;96:375-387. 

[IS ^sS^Ka e »nW GJ ,,P^ N P;Sci^1« 2 84:4 5 5*1. 
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[14981 620. Protein secY signatures interacts with the signal sequences of 

The eubacteria. secY protein ^^^^^^Z^ system: secA and secE. SecY is 
secretory proteins as well as with wo oth ,er ""J*^™ P esjdues tnat appare ntry contains ten transmembrane 
an integral plasma membrane ° function, providing a channel for periplasms and 

segments. Such a structure o secY are fo^nd in archaebacteria [2], SecY is also encoded ,n he 

outer-membrane precursor P^,^^«uUbe involved in a prokaryotic-like protein export system across the 
chloroplast genome of some algae [3] where it couW b > ,n ~ p J ent jn cnromop hyte andcryptophyte algae, 
twomembranesofthechloroplastendoplasm^ 

Two signature patterns have been developed for se Y * Th ^ ^ ^ ^ ^ fQurth 

region, which is the most conserved part of the fifth transmembrane region. 

«^ 

S.E. FEBS Lett. 298:93-96(1 992). sionature The following small hydrophilic plant seed 

[1499] 621.(Se e dprotein)Smallhyd^ 

proteins are structurally related: - Arabidops.s 1A B19.1B. B19.3andB19.4. - Maize late 
Sant (LEA) protein D-19. - Carrot EMB-1 protein. -^^2^ p8B6 .. Rice embryonic abundant 
. embryogenesis abundant protein Emb564. - Radish late seed ™^*™ ^\ ^ £m proteins . Tnese proteins 
protein Emp1. - Sunflower 10 Kd late embryog ^^^^^^j^^ seed for survival, maintaining 

t ,6Ml 0on M n S u6pa>l 8 m:G-[EO)-T-V-W^-T. R ThomasTi Su ^ z.r. km m* 

RKa^^*^^^ 8 -"^- ^ f *-"-°--" 1 

Mol Gen. Genet. 238:409-41 8(1 993). 

o [1 502] 622. Serine carboxypeptidases, active s,te * se rinecarboxypeptidases. The catalytic activity 

All known carboxypeptidases are either metaHo cartoxype „ by a cha rge relay system 

of the serine carboxypeptidases, like that of the E is ftself hydrogen-bonded to a serine [1 ]. 

invoMng an aspartic acid residue hydrogenWed toa carb y oxypep tidases I, II. and III [2]. - 

Proteins known to be ^^^^^^^^ involved in degrading small peptides. - Yeast 

js Yeast carboxypeptidase Y (YSCY) gene PRC1 ) a vacuo^r *o ^ g probable carbQX . 

KEX1 protease, invoked in killer tox.n and ^^^^S. . p e nicillium janthinellum carboxypeptidase 
ypeptidase involved in degrading or P£ e ^^ cpdS. - Vertebrate protective 

S1 [4]. - Aspergullus niger carboxypeptidase ^.^^^^JSa^ but also essential for the activity of 
protein / cathepsin A [5], a lysosomal protein ^^^^^^^ (W P]. - Naegleha fowieri 

40 both beta-galactosidase and neuraminidase. - Mc*qu. ] Senorhabditis elegans hypothetical pro- 
virulence-related protein Nf314 £ - ^^^^^2o includes: - Sorghum ( s)-hydroxymande.o- 

SlsbergRee.C—s^^^ 

has N.D., Barrel! A.J. Mem. Enzymol. 244:19-61(1994) .[Ell 0 , sttuclura ||, related 

. [llos, 623. Serine signature. ^^SSZ^SSSSS^^^^ ■*-?- 
proteins. They ere high molecular weight (400 B 500 amino « . „, |0I „ appropriate serine 
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fam.areiistedbe^^ 

(alpha-1-anttrypsin, contrapsin) - '^S^^SZwtlm 1 (PAI-1) and 2 (PAI-2). - Glia derrved nexm 
factor II. -Complement CI inhibitor 

( GDN) (Protease nexin I). - Pro e ,n C '^^^^SK of the host immune response against tumor cells. - A 
ce „ carcinoma antigen (SCCA wh,h mayac n ^S^^ in contrast to other serpins, is an intra* ular 
lepidopteran protease inhibitor. - Leukocyte e a * ra /* ^ _ CowpQx v|ms cfmA [6] _ an 

protein. - Neuroserpin [5], a neuronal inhibitor <*j£™»*" ™ CrmA is J e on |y serpin known to inhibit a non- 
Lbrtor ol the thiol protease inter.eukm-1 B conve ™^Jffi£^ may be invoked in the regulation of the 
serine proteinase. - Some orthopoxviruses probable .protease nhtoito ■ y ^ ^ ^ ^ 

blood clotting cascade and/or of to belong to this family: - Birds ovalbumin 

similarities, a number of proteins with no know ' ^ V ^ of the angiotensin active peptide. - Barley 
and the related genes X and Y P= - (CBG). - Thyroxine^inding globulin (TBG). 

protein Z; the major endosperm albumin. - Corticosteroia oremai B _ endoplasmic reticulum 

P Sheeputerinemi.kpr 0 tein(UT M P)andpiguteroferr,n^ 

heat-shock protein that binds strongly to °°" a ^^^ factor precursor (PEDF), 

[7] . - Maspin, which seems to f unct,on as a tumor ^P^ 8SOr ^ Pl 9 * tein f rom X enopus [9]. A signature 

ten to fifteen residues on the C-terminal s | d ° 
, HS04] Consensus pattern: IM VM ^ l ^^^ r 2^ p., Pemberton PA. Boswell D R. Cold Spring 

R.W. Biochemistry 28:8951 -8966(1 989^4] 

Harbor Symp. Quant. B.ol. 52.527-535(1987 ;l 3 ^ 

Remold-O'Donneel E. FEBS Lett. 31 5:10 ^\ 0 f l (1 ^ 993 ^?L t STv C A Pickup D.J., Howard A.D., Thomberry N.A., 
deregger P. EMBO J. ^f^^^^^^U^ ***** B D Biochim. Biophy, 

J Biol. Chem. 267:7053-7059(1992). 
« [1505] 624. Sigma-54 interaction domain signatures fS recognise d by core RNA polymer- 

Lme bacteria, regutetory proteins 

ase associated with the ■^^"•"^^^^ has been found in the proteins listed below: - acoR 
in the ATP-dependent [1 ,2] interactio nw JJ^^J^S^ acoXABC.-algB from Pseudomonas aeru- 
,rom Alcaligenes eutrophus, an activato of the ' a ^ t01 ^ 
35 ginosa, anactuatorof alginate biosynthe oper on for glycerol utilization. - fhIA from 
Transport protein. - dhaR from Crtrobacte freund, Zs & 111 structural genes. - flbD from Cau- 
Escherichia coli, an activator of the formate Sgenes eutrophus, an activator of the hydrogenase 

,obactercrescentus,anactivator of flagellar genej^ 

operon.-hrpSfromPseudomonassynng^ 

40 - hupR1 from Rhodobacter capsulatus, an f^^^^^S, . , eV R from Bacillus subtilis, which regulates 
coli and Sa.monella tyohimurium, ^^^^Ss^^^ as an1A and Vnf A) ^ 
,he expression of the levanase operon (levDEFG and sacC). n ( ^ r assjmilatory gen es 

an activator of the nif nrtrogen-fixing ope oa - ntrC from vanous ba - Salmonellatyphimuriu m, the activator of 

such as that for glutamine synthetase (glnA) or of the rf operon P9 aeruginosa , an activator of pilin gene 

45 the inducible phospho- glycerate transport system - p IRfrom P « « . R from Escnerichia coli, 

transcription.-^^ 

involved in the transcriptional regulation of aromatic amino acra J ^ actjvator of lhe to , p , a sm,d 

artii, an activator of plant pathogenicity f^^^l^^Scal protein yfhA. - Escherichia coli hypothet- 
xyiene catabolism operon xylCAB and of xylS. Es=hench« ^^^9 g ^ and pi|R) belong to 

so ical protein yhgB. About half of these ^ e ^f Q ^ 

signal transduction <w™<^^ a helix-tum-helix DNA-binding domain in their 

protein in their N- terminal section. Almost all of these i pro ems P jvjt Thjs may be req u.red 

C-termina. section. The domain which ^^^Z^™^ c ° ntainS an ^ 
to promote a conformationai ch ^^T^XS!^^ « l ° cated in thG 

SB motif A (P-loop) as well as a form of motrf & The ^TPD ^ ^ domajn ^ a|sQ cQnserved Qne 

domain; ^nature patterns have b^^ 
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Dixon R.EMBO J. 11:2219-2228(1992). 

[1506] 625. Sigma-70 factors family *^* u ™ te the attac hment of the core RNA polymerase 

Sigma factors [1] are bacterial transanal. on '" rtl ^^ 

,o specific initiation sites and arethen released . They alter the ^ known as , he major or pnmaiy 
a multiplicity of sigma factors. Two of o a wide variety of genes. The other s.gma 

sigma factor, and sigma-54 (gene rpoN or ^^^^^^ 0 f specific subsets of genes. W,th regard 
.actors, known as alternative sigma factors, are requ. red tohm P ^ JQ famj|jes Tne sigma 

osequence similarity, sigmafactors can be groupe ^ c j£^ ° sigma factors , some of which are listed 
70 famify includes, in addition to the ^* ^^^£je gen 9 QS: sigma . E (sig E or spollGB), s.gma- 
below: - Bacillus sigma factors mvotved ,n the xont raid apowWo p a ^ (sjgK ^ spo|VCB/ lllC ). - 

F (sigF or spoil AC), sigma-G (s.gG or s P oll ^ ) ; n S '^^ H ( ^^ tp °R^ nvoi ; ed in the expression of heat shock genes - 
Escherichia coli and re.ated bacteria 'T^ ^^^Z expression of the flagellin gene. - Eschench.a 
Escherichia coli and related bacteria s,gma-27^ 

coli sigma-S (gene rpoS or katF) wh.ch ^seems tc , be JjjT^J e P ssen , a| for the late - s tage differentiation of that 
external stresses. - Myxococcus xan of four regions of high conservation [2,3]. Each of 
> bacteria. Alignments of the si 9™- 7 °^ Signature patterns based on the two ta* 

these four regions can in turn be subdrv *d ' ^J^^^J^ to sub-region 2.2;the exact function of this 
conserved sub-regions have been developed. The first P*™™™™ £ . factor tothe core RNA polymerase. 
s^onisnotLwna^^ 

The second pattern corresponds to sub -reg.cn " «***-™ ° e n major sj (actors . The second pattern starts one 
SST 626- Signal carboxyl-terminal domain. 430 members. 

[1508] 627. Signal peptidases I signatures naQtjdases x remove the signal peptides from secretory proteins. 

35 Signa peptidases (SPases) [1] (also known as tead ^^iSJ^ ich is responsible for the processing of the 
fn prokaryotes three types of Spases are known: Wpe (gene W^J"^ and a third type inVOlVed *" 
ma ority of exported pre-proteins; type II jf^^X^ Sored in the cytoplasmic membrane by 
processing of pili subunits. SPase .s an ^^^^^m with the main part of the protein protud.ng .n 
one (in B. subtilis) or two (in E. col.) for the catafytic activrty of SPase a seme 

40 the perfcbamic .pace. ^ 

and an lysine.SPase I is evolufonary related to Ohe JJ™*"^ required for the targeting of prote.ns from he 
(genes IMP1 and IMPS) which catalyze the remova space [4]. In eukaryotes the removal of 
mitochondrial matrix, across the inner memb^ ^ 

signal peptides is effected by an ohgomenc enzymat ' c C ^J X 3° embrane . Two components of mammalian SPC, 
45 complex (SPC). The SPC is located .n the «*>Pfa™ » icu urn mem ^ ^ ^ ^ 

the 18 Kd (SPC18) and the 21 W/SPCa^^n^^^lhe^ ^ ^ ^ fof 

regions of sequence similarity w^.P^^^ 8 ^ 
proteins havebeen developed. ™efirsts^ 

ihe putative active site lysine wh.ch is not conserved ttie Qf ^ these proteins 

50 conLedregiono™ 

Consensus pattern: Pg^^^ ^^eHHMV^^VMFYl [K is an active s.te res.due] 

[ 11 Dalbey R.E., von Heijne G. Trends B.ochem „ ^ 9 ^ p 262; 
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disulfide bond Consensus pattern. [GAHIMhA!] i 
; These metal ligands are also ^^ie and a histidine has been selected as a s.gnature. 

Snt^^ 

orcnymoiryp r .., w „„.*. c .. conS erved cysteine involved in a di- 

bonds.+ ++ ^p^ywvvxCxttxxxxxxxxxxxxCxxxxxx u . consv 

thS rZ h gC to Gl3T7 genes).These integral membrane ^ \ numbe r of other sugar or 

« ™ K. U yv.r<»yc.s actis laetose P.-"^ C „1. carrier (gone HUP1). ■ A**^» 

y, and Eimrioella nidulans ^^7.^ transporter. • Leishmania donoyan, ,„2 
Liana glucose transporter (gene STP1) - Spina* sucr ^ „ metlcal pI<J teins YBR241C, YCR9BC ana 
M Lelhmanio enriellii probable transport P otein (LTP)_ « VP nyp0 «,ical proteins yabE yd,E 

vo, o«» Taenorhabditis elegans hypothec proM ZK637. • J«« . Baci|lus 6ubllBs hyp o,hetical protein. 
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G.H-IKR] mo* but because mis -mot* . too snort t°l . *^°;^ m ^^aaumM.olcon^adr^ 
and tilth segments HUWSAGW2H uVMSWEl-X^VMFYWA]-e- R-[RKl-«4,6HOSTA] 

D.C.M.. Henderson P.J.F. Nature 325:641-643(1987 .[5] Henderson P ^ P P ^ ^ ^ ^ ^ ^ 

[ 6] Culham D.E., Lasby ft. Marangon. A.G., Miner J.L, Steer B.A, 

268-276(1993). 

[1515] 633. Synaptobrevin signature whose {unctjon js not ye , kn own, but 

Synaptobrevin [1] is an intrinsic membrane prote.n of small synapse v Drosophila and yeast [2]. In yeast 

wLhis high* consent in mammals ^^^X^^^^ while in mammals there is at least 4 
there are two closely related forms R ?^^ 

(genes SYB1 , SYB2, SYB3 and SYBL1).Structural synaptoD g ^ ^ resjdues) c . tefminal , nt 

part of the sequence was selected. lKRl-rLIVMl-[STDEl- x-[LIVM]-x-[DE]-[KR]-[TA]-[DE] 

W igler M. Proc. Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). GYP7_YEAST, which are GTPase 

GTPases. Number of members: 55 

[1]Medl ine:96032573. Molecular Con ingof a cO = J l M^^~" d- " 
° CacS. NeuwaldAF; Trends BiochemSci 1997;22:243-244. 

[1517] 635. Transcription factor TF'ID repeat sign^u re ^"^^ g enera | f ac tor that plays a major role in the activation 
Transcription factor TFIID (or TATA-b,n ^^T^^^^^^ to the TATA box promoter element 
35 of eukaryotic genes transcribed by RNA ™ igree of sequence conservation of a 

which lies close to the position of sources. This region isnecessary and suf- 

C-terminal domain of about 180 residues n ™ , ^^^S wlB dom ain is the presence of two conserved 
ficient for TATA box binding. The most s.gnrficant str uctural ^ture ot x structure that sits astnde 

re P eatsofa77amino-acidreg^ 

(3VG-[TAGL] [KR]-x(7)- [AGC]-x(7 H L.VM [ 1] Hoffmann J^^'^^ PoJ^ M.. Chua N.-H. Nature 
45 Roeder R.G Nature 346:387-390(1990).[ 2] Gash A., HoHmann ££££ M Cnua N .. H „ Boede r R. 

346:390-394(1990).[ 3] Niko.ov D.M^ ^T^*. ^ n Y.N., Jan L.Y., Tjian R. Nature 361: 

so [!s?S 636. Translationa.ly controlled ^-^J^^S^p^ which has been found to be preferen- 
Mammalian translation^ controlled tumor prote. . CrCTP (° ^ ' P M ^ js als0 expressed in 

tia.ly synthesized in cells during the -rly growth P ase o,^ ome *pe ^ jn of 18 t0 20 Kd . close 

normal cells. The physiolog>cal function <* TCTP s stj n ° g kno J (F52H 2.11), Hydra, budding yeast 

homologs have been found ^^^^^^^^ ? e ^ s nav e been selected as signature 

55 (YKL056c) [5] and fission yeast (SpAC1F12.0<2C) iwooi me 
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„ .,n otio A BielkaH.Biochem. Int. 19:277-286 

25 reactions generate on«a*arH *nvat*J s tetranyarofo late dehydrog > en ^ aclivjtie s are ex- 

formyltetrahydrofolate synthetase 2 \ 4 9) . The dehydrogenase and cyclohyd oia whjch 

andLthenyltetrahydroto^ 
pressed *V a variety of 

catalyzes all three reac ^ n ^ d e e ^ c ^d one is cytoplasmic. In both forms the de ^ 8 ^ S ^ no ac y jd res idues. The 
30 mitochondrial matrix, wh.le «^T^2o amino acids protein and consists ot ^^nZ^m^se [2]. 
is l0C ated in the N-term.na, se Eukaryotic mitoch ondrial Jl WD d **> is an 

C 1 -THF synthases are N ADP^ £pen ^ ^ 30Q acd res^ b ^ ^ dehydro . 

=. — ~~ZZ~~-~ — - 

- =3SSa=ss5S issssr— 

around the active site residue is perfectly 

[ 1] Lolls E„ Alber T.. Davenport R .a, . Rose u., 
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Census patlem: |GAM1.2HDEl-x- V- *■[ ' N ^ 7 [J klascoR, Lopez-O* C, Munoz M., Bockamp 

d DG, Visse ft Sandhu G, Davies A, Rizkallah PJ, Melitz 
C Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:876-881. 
Number of members: 65 , TDO , . x 

[1524] 642. Nuclear transition protein 2 s.gnatures ( TP2 ] ^ conversion 0 , nucleosomal chromat,n to 

mammals, the second stage of spermatogenesis «chamrt«r«^ Thjs condensatlon . 

the compact, non -nucleosomal and transcr.pt,ona..y ^ {o ^ ment of histon es by several 

associated with a double-protein transition. The fin "TJ'SS J themselves replaced by protamines dur.ng the 
spermatid-specific proteins, also called J»»P"^ 2 Uhose spermatid-specific proteins. TP2 is a bas.c, mo- 
second transrtion. Nuclear transrtion prote, .2 £P2) .s « P Qf thfee distjnct parts: a conserved 

binding protein [1] of 116 to 1 37 r^J^ a ^5EL»l domain of 20 to 50 residues wh,ch conta.s 
serine-rich N-terminal doma.n of about 25 aTSSLl 70 ra-Uuee rich in lyslnesand wno^. 1Vro«Qnaluie 

: consensus pattern: H-x(3)-H-S-[NS]-S-x-P-Q-S 
Consensus pattern: ^ 

A number of enzymes require thiamine Py«* h T ^ hat ^ 
o Jomeofthes ee nzymesarestructura..y ^^^^P^ ♦ ™ ♦ H(2)0(2). - Pyruvate 
Reaction catalyzed, pyruvate + acetaldehyde + CO(2). - Indolepyruvate decarbox- 

decarboxylase (PDC) (EC4ili) * ea « ron ^ 

ylase(EC4JJ^)l2]Reacti 0 ncatalyzed:.ndole-3-pyruvat^ _ Benzoylformate decarboxylase (BFD) 

JaLS) (EcSSfil Reaction catalyzed. 2 pyruvate = ""J" ^fl. A conserved region which is located ,n 

40 [1526] 644. TPR Domain 
[1] 

Medline: 95397415 ♦♦„tdr9 
Tetratrico peptide repeat interactions: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hieter P; 
Trends Biochem Sci 1995;20:257-259. 



interactions. 
Das AK, Cohen PW, Barford D; 

EMBO J 1998; 17: 11 92-1 199 
Number of members: 621 
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in Escherichia co.i and related bactera^^ 

.functional protein composed of a ^^XTse^ot SUMT is reiated to that of a number of P. denitnficans 
a C-terminal domain which has ^ M J ^ 

and Salmonella typhimunum enzymes .nvolved in th e wosym cob|/cbjL ^ encodes s . 

Consensus pattern: V^U]-x(2^-^-^^^H U ^J; H ^ Bacterio| 173;4637 -4645 

[ 1] Blanche R. Robin C, Couder M., ^^P^^^TZ^ J- J- Bacteriol. 173:4893-4896(1991). 
1991 ).[ 2] Robin C, Blanche R, Caucho. ^Cameron B =a der M Cr l ^ ^ ^ 

[ 3] Crouzet J., Cameron B., Cauchois L, Rigaul 8.. Defter-Wiggins S., Church G.M. J. Bacteriol. 175: 

amUguMM in the alignrnontNumber of mernbe m 18 Tien<Js B|(J( . hen , Sci , 997;22: 

mM edline: 9720056, Tudor domains ^^^^^.Tnidon^ir To.ganaa.ion and rela.ion.nip to <h. 

Mornon JP; Biochem J 1997;321:125-132. 
; [1529] 647. Terpens synthase family desianated tps (for terpene synthase) [1]. It has been split into six 

« has been suggeste d that t ^^^"S^^L. vetispiridiene synthase Swiss:Q39979, 5-epi- 

tpsb includes (-)-limonene synthase, Swiss.Q40322. 
o tpsc includes kaurene synthase A, Swiss.004408. 

tpsd includes taxadiene synthase, Swiss:Q41594 pinene synthase, 

Swiss:024475 and myrcene synthase, Swiss:024474. 

tpse includes kaurene synthase B. 

tpsf includes linalool synthase. 
?5 Number of members: 51 

40 Bohlmann J, Steele CL, Croteau R; 

J Biol Chem 1997;272:21784-21792. 

ThiF/MoeB/HesAfamily.Number of members. 87 
45 115311 649. Thioester dehydrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 19 
I' I 

55 Database reference: PFAMB; PB058036; 

ri532] 650. Tub family signatures r «ictance and sensory deficits. This mutation 
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Consensus pattern: F-[KHQ]-G-R-V-[ST]-x-A-S-V-K-N-F-Q 

Consensus pattern: ^^H^^S^^H^*^^ Ho , mgren L> charlat 

[ 1] Kleyn P.W., Fan W., Kovats S.G., Lee J L. Pulido J C Wu Y 3erK erne. , ± 

O WooK E.A., Tayber O., Brody T„ Shu P.. Hawk.ns 'J, ^ " ^A Duvk G M Moore K.J. Cell 85:281 -290 

11533] 651 . Eukaryotic : DNA tne two types of enzyme that cata,yze the interconversion 

DNAtopoisomerase I (EC 5.99.1.2) [1.Z.d,4£i| is onsu jti- ^ breakaqe of DNA, one strand at a 

of topo.og.ica. DNA isomers. Type breaks a DNA backbone 

time, and the subsequent rejommg of he strands. When aautaj^ ^ P osine residue js jojned to a 3 , 

~numbero~ 

Consensus pattern: P^W^^J^ A , Mondragon A. Curr. Opin. Struct. Bio.. 5:39-47 
[1]SternglanzR.Curr.Op.n. Cell B.O.. 1.533 535(1 W » ■ Acad Sci U.S.A. 86:3559-3563(1 989 . 4] 
1995) [3] Lynn R.M., Bjornsti M.-A, Caron P.R., Wang J.C. Proc. Natl. Acaa. be 
Roca J Trends Biochem. Sci. 20: 156-1 60(1 995).[E1] 

[1534] 652. Transaldolase signatures thre e-carbonketol unrt from sedoheptu.ose 7-phos- 

Transaldolase (EC g2A2) catalyzes the reverb ^ a " s ^°< 3 g ™ ^ 6 . phosp hate. This enzyme, together 

phatetoglyceraldehydeS-phosphatetoformer^^ Transaldolase is an en- 

with transketolase, provides a link between the glycoyt.c a ^^^^^'^ nas bee n imp.icated 
zyme of about 34 Kd whose sequence has been well ""T^J?"*^^ group of fructose- 

ReizerA ° Saie. M.H. Jr. Microbiology 141:961-971(1995). 

[1538] 654. Trehalase signatures dearadation of the disaccharide alpha, a.pha-trehalose 

Trehalase (EC 3^28) is the enzyme respons b ,e to [^f^J n]sms and whose sequence has been 
yielding two glucose subunits [1]. t is »'^J^^^^L aM have been selected as signature 

pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 
781 -788(1 993).[E1] 

phate synthase/phosphate complex, [1]. 

[1541] [1] Kaasen I, McDougall J, Strom AR; Gene 1994,145.9-15. 
[1542] 656. Tropomyosins signature 
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Tropomyosinsp^are^^ 

tropomyosin mediate the interactions ^"^^ 

The role of tropomyosin m smoo h muscle and n ^™ e t ^ U ^ e cnaracter ized by having 284 amino acid residues 
that forms a coiled-coil dimer. Muscle 'soforms of t genera |ly smaller and are heterogeneous 

S^sSssss 

Number of members: 81 [1] 
Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
White SP, Cohen C, Phillips GN Jr; 
Nature 1987;325:826-828. [2] 
Medline: 95155315 

A direct regulatory role for troponin T and a dual role for 
troponin C in the Ca2+ regulation of muscle contraction. 
Potter JD, Sheng Z, Pan BS, Zhao J; 

j Biol Chem 1995;270:2557-2562. 

[3]Medline: 95324796 
The troponin complex and regulation of muscle contraction. 
Farah CS, Reinach FC; 

FASEB J 1995,9:755-767. 

[1 544] 658. (Tryp mucin) Mucin-like glycoprotein The tejn consists o{ three regions. The 

[,545] Thisfami,yoftry = ^^^ 

■ 

to specific tRNA molecules as the first step ,n P^^SJSte rentamino acid In eukaryotes there are generally 
different types of aminoa^ 

twoaminoacyl-tRNAsynthetasesfoi -<£^™££2L of subunit size and of quaternary structure 
e these enzymes have a common function, they are ^^ N C T^ thetases snare a region of similarity in their N-term.nal 
Afe W yearsagoitwasfoundl2]thatse^ 

section, in particular the '° nsensU * tet ?f ^ been found in the aminoacyl-tRNA 

[LIVMFP]-[HT]-[LIVMYAC]-G^HNTG]-[^MFYS^GPC] M., Mackie G.A., Schimmel P. 



EP 1 033 405 A2 



different types of aminoacyl-tRNA f synthetases °" e J™ fc form and a mit ochondrial form. While all 
two aminoacyi-tRNA synthetases 
these enzymes have a ^ 

A few years ago it was found [2] that several am,n ^V l ™ y „ conserved . The 'HIGH'region has 

section, in particular the '°" sensuste,ra P e ^ Hl ^ found in the aminoacyl-tRNA 

been shown [3] to be part of the adenylate ^'^^ tyrosine, tryp- 

IUVMFF^^>W^HVW^PC Ku|g m Mackje Q A Schjmme , p 

[ 1] Schimmel P. Annu. Rev. Biochem. 56. 125-1 58(1 9B7).l 4 « BD81 • 9f J. aMB « 98B) r 41 De |arue M., Moras 
Science 226:1315-1317(19S4).[ 3] Brick P., Bhat T.N., ^^^^^^^K., Doolittle R.F. 
D BioEssays 15:675-687(1 993).[ 5] Schimmel P. Trends B.ochem. Sc. 16.1-3(1991)1 6] Nagei u. 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
M5481 661 . (tRNA-synt 1 C) tRN A synthetases class I (E and Q) 
1549 Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only glutamyl and glutamyl tRNA synthetases. , tRNA(Gln) 

,n some organisms, a single glutamy.-tRNA -l^^^^^^^^ ™ ' 1 
[1550] [1] Rath VL, Silvian LF, Beijer B, Sproat BS, Steitz TA; Structure 1998,6.439 449. 
115511 662 (tRNA-synt 1d) tRNA synthetases class I (R) 
[1552] Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only arginyl tRNA synthetase. 

rSSSTSSil [2,5,81. Signature patterns have been derived from two of these reg.ons. 
s Consensus pattern: 

Consensus pattern: l^AL^E^W ^^^J, M \ Moras D . Bi0 Essays 15:675-687(1993).! 3] 
[1] Schimmel P. Annu. Rev. Biochem. 56.125 15 ^\ SB ;' l^™ nr ^ jttlfiRF p ro c Natl Acad Sci. U.S.A. 88: 

1,5541 rrsTf^wSi^i-"-'--*'- 

these enzymes ave a { '°^ 

Afewyearsago.twasfound[21that several ^m^oa< J J well conserved. The 'HIGH' region has 

Science 226:1315-1 317(1984).[ 3] Brick K, Bnaim., D ' ow iB-i-3f 19911 1 61 Naqel G.M., Doolittle R.F. 

D BioEssays 15:675-687(1 993).[ 5] Schimmel P. Trends Biochem. Sci. 16.1 -3(1 991)1 6J Nage. u.m 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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[1555] 665. Aminoacyl-transfer RNA synthetases cta£l -gnatures JRNA acjds and Uanster tnem 

Aminoacyl-tRNA synthetases (EC 6.1 1.-) HI ^^^p**^ organisms there are at least twen y 
to specific tRNA molecules ^J^^^S^ZmJm^ acid. In eukaryotes there are generany 
different types of aminoacyl-tRNA one cyt osolic form and a mitochondrial form. While all 

two aminoacyl-tRNA synthetases for ^^^^rse intels of subunit size and of quaternary structure, 
these enzymes have a common ^™^™™*2 add glycine, histidine, lysine, phenylalanine, prohne, senne 
Thesynthetasesspec^a^ probab , nave a common .W*^ 



,ecificfor alanine, asparagine aspartlcacc ^^^^^ tolding pattern in their 
and threonine are referred to ^^^^^^^^t^ Rossmann fold observed for the class 

r 11 Schimmel P. Annu. Rev. Biochem. 56.125-158(1987h ~^ p Nat| Acad . S ci. U.S.A. 88. 

'schfmme. P. Trends Biochem. ScL 16,-3(199 V 4 Nagel J^S^ 19:34 89-3498(1991).[6] CusacK ^ 
B191 8125(1991) [ 51 Cusack S., Haertlein M.. Leberman n. inuo . M N sar N Leberman R. Nature 

347-249-255(1 990).[ 8] Leveque R, Plateau P., Dessen k, d H 

[1556] 666. Thaumatin family signature than sucrose on a molar basis) from 

Thaumatin [1] is an intensively sweet-tast,ng prote^ (1 W 000 t^mes ^^^^ g dfeu|f jde bond 

Thaumatococcus daniellii, an Afncan brush. ^^J*™^"^ pro tein are listed below (references are only 
A number of proteins have been found ' e !*^^^^[^^^te^trypsin inhibitor. - Two tobacco P al ^°|j 8ne ^'f " 
provided for recently determined sequences^ - ^JJ»JJJ£2 after in ^ ction with viruses. - Salt-induced p-rote.n 
delated proteins: PR-R major and ^ tobacco. - Osmotin-.ike proteins OSML13, OSML15 and 

o^oZ^^ 

Snsensus pattern: ^^"^^aj^n M Y.. Visser C, Verrips C.T. Gene 18:1-12(1982). 
Shah D.KL; Plant Physiol. 106:1471-1481(1994). 

ri557] 667. Thiolases signatures Ollkflr v 0 tes and in prokaryotes: acetoacetyl-CoA thiolase (EC 

Two different types of thiolase \^^^^^^^cM thiolase I) has a broad cha,n- 
2.3.1.9) and S^u^^^^^.^^^rSSwlve pathways such as fatty acid betaox,dat,on. Ace- 
^specificity tor its substrates and is ™^ 

, toacetyl-CoAthiolase (also called th '°f °">' 5 ^^ In eukaryotes, there are two forms of 

pathways such as poty beta-hydroxybutyratej^ ^ peroxisomes. There are two conserved 

P 3-ketoacyl-CoA thiolase: one located ,n ^^^^^ ^terminal section of the enzymes is -nvolved 
cysteine residues important for th.olase acbvrty. The at the C-terminal extremity is the actrve srte base 

intne formation of an acyl-enzyme intermedia e ; ^ , ip id-transfer protein (nsL-TP^ I also 

5 invoked in deprotonation in the condensate reaction J Mammal P ^ prQtejn (scp . 2) d 

[^'peoples OR. Sinskey A.J. J. Biol. Chem. 264.1 5293-1 5297(1 999)1 2] Yang S.-Y., Yang X.-Y.H-, Healy-Louie G.. 
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Sarins A number of eukaryotic proteins ^^^^^^ reticulum enzyme that catalyzes 

Snsthauctas* coli dsbA (or prtA) and its orthologs .n V.bno 

a so contain a thioredoxin domain. These P^"lJ*5E5l dsbC (or xpRA) and rts orthologs ,n *mm 

" SSri, 'JSSSS- * ™* rt TJ2S'i^ ™ repeal 
W.o , neiiiueiy ^- 8 2879 2890(1994) 

2D (MEF2A to MEF2D). These proteins are transc nptio ,ac * myo cyte-specific enhancer factor 2 (MEF2 . 

metabolism regulation protein I (gene ARGR1 or ARGBO . transcrip ,ion factor involved in regulating iqbwb 
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, k now flower - Arabidopsis thaliana homeotic proteins Apetalal (AP1). 
n< the stamens by petals and the carpels by a new flower wao« h merisle m and to determine sepal 

and'pistillata (P.) which act locally to ^J^^Scbn. (DEFA) and globosa (GLO) 
and petal development [4]. - Antirrhinum majus and J^J^P, flower development. Mutations in DEFA or 
fsi ith proteins are transcription factors .nvolved in the ^ 

fsTZr.x(4H^ 2] P»«n=™ S.. Maine G.T., EWeB., 

to an aldose receptor, such as ribose 5-phosphate, to wn w . w gnd ntose .phosphate path- 

nhl This enzyme together with transaldolase, provides a link be weenthe gy y ^ , g g homodjmer 

o wlyrix reqSs thiamin pyrophosphate as a co,artor a '"^°^ a nd prokaryotic sources [1 ,2] show that 

^ a pLximateV 70 Kd subunits. TK '^^^^^^^ ^ 
me enzyme has been evolutionary conserved. In the Peroxisomes » known as forma idehyde tran- 

r h eSi 9 h. y rented en^ 

sketolase), which exhibits a verjM '^*^*^L m > so far found in bacteria (gene dxs) and plants (gene 
» iose-5-phosphate synthase (DXP synthase) [3] is an enzyme satjon reaction between carbon atoms 

CLAD which catalyzes the thiamin pyrophosphoate-dependen aC y' lulose . 5 . pnoS p ha te (dxp), a precursor in 

2 and 3 of pyruvate and glyceraldehyde ^f^^^^^A- DXP synthase is evolutionary 

related to TK. Two regions ^^^^^SS^ during catalysis [4], The second, located J the 

^pattern: B- X (3HUV S 
1992H21 Fletcher T.S., Kwee I.L, Nakada T Largman iu , ■» BegleyT.P, Bringer-Meyer S., Sahm 

j 11-2373-2379(1992). 

40 H 562] 672. Transmembrane 4 family signature foundtobe evolutionary related [1,2,3]. The proteins 

Rece tly a number of eukaryotic cell surface an ^*^ 
knowntobelongtothis family are listed below:-Ma^ 

and aggregation. - Mammalian leukocyte ^^^^^c^. - Mammalian lysosomal membrane 
CD53 (OX-44), which may be involved ,n Sta^ES Mammalian antigen CD81 (cell surface protein 
45 protein CD63 (melanoma-associated W MMIW A ) ^ MM , a n antigen , CD82 
TAPA-1), which may play an important ro e .n the '^matron y w deijvers costimulatory signals for 
(protein R2; antigen C33; Kangai 1 (KAI1 , wh.ch as ociates '^^^ tetraspan anti gen 3 (PETA-3)). - 
he TCR/CD3 pathway. - Mammalian antigen CD 15 I (SFA . pfcte g ^ . Human tumor . 
Mammalian cell surface glycoprotein A1 5 (TALLM ^£JJ?£ Kd surface antigen (SM23 / SJ23).These pro- 
so associated antigen CO-029. - Schistosoma membrane proteins (type III proteins are j«jg£ 
teins share the following characterist.es. they ZZma domain which is not cleaved dunng b.osynthes.s 
membrane proteins that contain a N-terminal ^^J™rane anchor); they also contain three additional 
and which functions both as a residues, and are of approximately the same size 218 
transmembrane regions, at least ^^T^^^Jt^A^^ (TM4) because they span 
« to 284 residues). These proteins are collectively know as me u jng )Sshown be)ow . + . 
55 The ptsma membrane four times. A schematic W^ ^g?^ Cyt , TM3 I Extracellular I TM4 I 
-— - + " cyt: cytoplasmic domain. TMa. 
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« rn fL ,V M1 -x-H-x-G- t STA]-H-K-x-N l Kisthepyridoxa.-Pattachm e nts l tel 

40 eel, V^gZ^l^X^y ■ Hepat ° Cyte 9f0Wth faCt ir Q ala S Prostate specific antigen 

(Wegener's autoantigen). P'^™ n09 ba troxobin, cerastob.n, fl avox0 ^' *™ J ° . Apo ,i p oprotein(a). - 

4£ tases. - Snake venom ^ s ^n6 cZ Q eno^ protease from Atlanfc sand f.dd.e crab Ap P P 

genase from common can ^ ps f n | ike proteases, alpha, easier snake-tocus D ° P ^ jcation 

55 Consensus pattern. IDNbiMouj i eiM994HE1l 
[1566] 676. (tsp) Thrombospond.n type 1 domam 
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s^sss- — 

units bind directly (o^^^^ 

Consensus pattern. <M-R-[DEHIL] 3 3 9 _ 34 3 (1 g 8 8). eir , natur es Aminoacyl-tRNA synthetases 

r 11 Cleveland D.W. Trends Biochem^ Sci 1 J tnetase s class-ll sign a tures. ™ * molecules a s 

1 7 C 01 678. (tRNA-synt ^^^l^e amino acids and uanjter J^JJ^ ^ of ami noacyl- 
2S EC 6.1.1 ,) [1] are a group of enz V m ; S 7 okaryotic organ isms there are at least m ty aminoacyl , RNA synthetas- 
ne first step in protein J^^J^J acid. In eukaryotes there ^jj,^ enzym es have acommon 
tRNAsynthetases, 0 neforeachd,«eren^ 

es for each different am-ncacd. onecyto ^ ^ d quaternary s ructure jne are re ierred 

347.249-255(l990).[81 Leveque F.. Plateau P., havjng 
^kxi.lou^andlh.^^P^y KW , h H.V-lVpr.Die Ck mannT, Withers- 

Ubiquitin catboxyMerm.nal !^"*^" J g^iaa ol ut»qu». TO» '"^.J ol UCH. The tirst clan 

from the region around that resiou . 
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,he active site residue 10 89:127-139(1991).[ 2] D'andrea A., Pellman D. Crit_ 

[ n jentsch S., Seufert W., Hauser H,P. Bjoch m ; B.ophys_ Acta K Q H|)| c p EMBO 

Lv. Biochem. MoL Biol. 33337-352(1 998).[3 * J* ^-^^^994). 

Kin carboxyMermina, hydroiases (UCH) <^^H£2 n ymes are invoked in the processing of 

, H uman jsopeptidase T. - Human isopeptidase T-3. M f mma ";" n-UbD-64E - Caenorhabditis elegans 
D-phT^^ 

hypothetical protein R10E11 .3. - Caenor t h f ^ which is probably implicated in the e^mrt- 

re gions of similarity. The first ^^^^^^ «» 01 which iS a ' S ° ,mP " Cat 

^ophila.a<.ac e tspm.ei^^^^ 
„ hypothec protein R1 0E11 3- • CMn0, , haM "'"^%S„T*Sh is probably implicated in the catalytic me* 
regions at similarity. The Brat region contain ^T^'^Z, oaa at which is also probably implicated in lb. 

35 live actwe site residue] 5VG-H-Y [The two H's are putative act.ve site resjdues] 

Consensus pattern: Y-x-L-x- SAG]-[LIVMFT]-x(2)-H x 6 1 nnq- 1 27-1 39( 1 991 )•[ 2] D'andrea A. , Pellman D. Cnt. 

a UTP^gaMoasmall hydrophobic^ 

f erases (UDPGT) [1,2]. A large fam.ly of ™™* a ™^™^\c substrates. These enzymes are of major .m- 
curonic acid to a wide variety of «^^ e| ^S? such as drugs and carcinogens. - A large 
nortance in the detoxification and subsequent elimination oi xe "° o. nvdroxva cylsphingosine 1-beta-galactosyl- 
45 umber o putatrve UDPGT from Caenorhabditis ^ ^ enzyme catalyzes the transfer 
ransferase [3] (also known as which are abundant sphn- 

of galactose to ceramide, a key enzymatic step pe 9 ripheral nervous system. - Plants flavonol 0(3)- 

goSpids of the myelin membrane of the central UDP-glucose to a flavanol. This reacfon 

^.transferase. An ^^W^'^^^^^SS^ " Baculoviruses ecdysteroid UDP-glucosyl- 
so to essentia, and one of the last steps -n '^^^^^^ * 9 ,ucose ,r ° m UDP ^ luCOSe t0 ""TT 
transferase (EC 2.4.1.-) [5] (egt). This host interferes with the norma, insect deve.op- 

which are insect molting hormones. The expression of egt , M (gene ^ gn enzyme .nvolved 

ment by blocking the molting process. - Prokanyot """^^ ^ettozeamMntozea^m^ 
Tcarotenoid biosynthesis and that ca^^ 
ss Slucoside.-Streptomycesma^ 

tibiotics via 2'-0-glycosylation using UDP ^^° ^oattern nas been extracted todetect them. 
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j Press, Boca Raton, 
., Mulder G.J., Chow- 



« a dna ^ m -^SEE^ 

gjaasssffsss^--*----- . fo9es9 

osis hypothetical protein MtCY270.20. - 9 T^SS^vS^ alginolyticus hypothetical prote.n .n pit 5 re- 
Consensus P att ®^ ^"^p^y^h Jd observations (1 996). 

: 'wmmmm 

I S Archaeoglobus fulgidus hypothefcal protem A F 54a m jns range 1rom 30 10 120 Kd. 

M S,rhanoLc U siann^ 

They all contain a number of transmembrane reg,ons. 
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Consensus pattern. G-ISl If] v *w i 

ft Bairoch A. Unpublished *™2^^ FOO o4 signature . Escnericnia coli hyP o- 

pylori hypothetical prote. n HP0285 . l My P omQnas aerugin0 sa * m z Hypothetical protein 

J5 Consensus pattern: [LIVMl-x-[LIVM j M 

[1] Bairoch A. Unpublished <*™^JL pp0005 signatore nced Gene T ran- 

25 [1] Walter L, Marynen P.. bzp^ .. upF0006 signatures similarities - Yeast chromosome II 
l [1588l ^I.Uncharacte^ 

The following uncharactenzed PJ«f "\ co|j hypot hetical prote.n ycfH and H 0454, uwc w ^ H(0081 
pattemS ' « rn- fl |VMFYl(2)-D-[STA]-H-x-H-[LIVMFl-[DN 

picL^ 

45 Consensus pattern: >«-^££^ 19Br) 

[ Bairoch A. Unpublished <^"*^< UPF00 15 signature . Yeast chromo some II 

l [1590] 693.Uncharac^ 

The following P £ ast chromosome XIII hypothec! pyl0 ri hypothetical prote.n 

hypothetical P^^^^esponding Haemophilus influenza .e R P ro f " ow H n ^ C a 0 " terium P g, u tamicum hypothetical 
50 protein yaeU and HI0920, the com** _ a B 1937_F2_65. - A Co ^ n ° b * Ct °™ | . Synec hocystis strain 

HP1?21 . - Mycobacterium lepr e nypol hetical protein in tWf"^- Th ese are proteins 
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.Fission yeas, hypothetical protein ?™TaBXK«o r ^ that ^ ,o conttta m 

by the following pattern which is locaieu 
up in the database by 

35 Consensus pattern. D-V-IUV] x W i 

The following "J*"*JJ£ P yeast cnrom0 some IV hypothetical prote.n YDL1 77C t ^ tnetica , protein 

» 5^5aS==^=^.^e^ase b y 



EP 1033 405 A2 

lished observations (1996). .oenrvu ^ianature 

[1602] 702. Uncharacterized protein family ^.^X share regions of similarities: - Escherichia coli hypo- 
The flowing uncharacterized proteins have protein. - Escherichia coli hypothetical 

thetical protein yhdG and HI0979, the ^"^ "^^CoSZSl Escherichia coli hypothetical protein 
protein yjbN and HI0634, the co " es P° nd ^ P Bacillus subtilis hypothetical protein yacF. - 

yohl and HI0270, the corresponding^ HaemophHu ^™* e £™ m brasilense and R hiZ obium leguminosarum. 
Rnodobacter capsulars proteir jnJR3 ^^^^^Z^ strain PCC 6803 hypothetical protein 
- synechocystis strain PCC 6803 ^P 0 ^.^ 3 ^ 2 " Yeast protein SMM1. - Yeast hypothetical protein 

RG Mol Microbiol. 8:903-914(1993). 
, [1603] 703. Uncharacterized protein famify ^^^T^ regions of similarities: - Escherichia coli hypo- 
The following uncharacterized prote.ns have h *J*?J^^^ pro , e in. - Mycobacterium tuberculosis 
thetical protein yacE and HI0890, the ~"f^^^ 
Hypothetical protein .^0182.23 ^ 0^ 

PCC 6803 hypothetical protein slr0553. - Other nypomei ca p xanthomonas campestns. - Human 

1 sus, Neisseria gonorrhoeae, Pseudomonas P^^^^SSLd^ elegans hypothetical protein 
hypothetical protein pOV-2. - Yeast hypo het.cal prote. ' ^980. ^ /GTp . bjndj motif <A . (P ,oop) (see 
T05G5.5.These proteins all conta.n, ^ 



vations(1997). 

[1604] 704. Ubiquitin-conjugating enzymes ac live , srte attac hment of ubiquitin to target 

Ubiquitin-conjugating enzymes (UBC or E2 < 3f^^ u SS£Satlng enzyme (E1) to E2which later ligates 
proteins. An actrvatedubiquitin moiety is ^"JT^aSS^ W recognizing proteins (E3). In most 
ubiquitin directly to substrate proteins ^ 

species there are many forms of UBC (at lea st 9 iin ; yeast) wn cn P e jp UBC , S and the region 

[1605] 705. Uroporphyrinogen deca^ 
UropoWinogendecarboxy.aseCUR^ 

decarboxylation of the four acetyl side chains of uropo phyrinog y h hep atoerythropo.et,c por- 

is responsible for the Human J^^^X^^^ evolutiJ. The best conserved region is 
phyria (HEP). The sequence of URO-D ha. be en i weir c , jde Tnere are two arginine residues in this 

located in the N-terminal section; rt contains a P° rte *^™~ ' "J* th P 0 c arb oxylgroups of the propionate side chains 
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en, memyialion 0. •SSS. SZ« bLynfoesis rnefoiyiransteraae. ■ Baciiius 

They can be picked up in foe database by .be following pa«em S . 
,o [ 1] Lee P.T., Hsu AY, Ha H.T., Clarke C.F. J. Bacterid. 179.1748 1754(1997). 

, S used as a signature J,;^,^^^* [FY].x(2)-L..(5)-R 
^Z^Ka.laS 

20 'pSSlM-d »*» fo DNA [2,. Name., d members: 39 

p, Express^, and ,ofo 0, foe anfcersa, areas pro,* UapA. p. Escberie* coli *r,n 9 a-o* _L Nys.ro. T, 

25 EV; Genetics 1996; 144:817-828. 

[1610] 709. Ubiquitin domain signature and profile eukar yotic cells and whose sequence is 

Ubiqurtin [1 ,2,3] is a protein of seventy s,x amino c,d e ^found n a, I eu ry ^ ^ ^ as 

extremely well consen/ed from protozoan to «^^ tt P^ n ^ of chromatin structure, regulation of gene ex- 

so ATP-dependent selective degradation of cellular P"*^™^^^ many gen es coding for ubiquitin. How- 
pression, stress response and ribosome ^tZTZ^sX^um moLu.es consisting of exact head 
ever they can be classified into two classes. ^ in a Xenopus gene). In the majorrty of 

to tail repeats of ubiquitin. The number of ^^ZeZZeaX The second class of genes produces precursor 
polyubiquitin precursors, there is a final am,no-ac,d atterth ^ ' ^ nsion tein (CE P). There are two types of 

35 proLns consisting of a single copy of ubiquitin ^^^S^SL, the last tour C-termina. residues 
CEP proteins and both seem to be ribosoma. protein* ubqrtn « _ The ^ „ mediated 

(Leu-Arg-G< y -Gly)extendingf^ 

by the covalent conjugation of ub.qurt.n to target "Z< e are a number of proteins which are evolutionary 

the epsilon amino group of lysine residues n the targ^ 

40 related to ubiquitin: - Ubiquitirv-like ^^^^^^b. - Mammalian protein GDX [4]. GDX is com- 
(BVDV). These proteins are highly similar ^e.r g W tic co p ^ Qf g3 W|th 

posed of two domains, a N-term.nal ubiqurt.n-hke domain o 7 A ^resia g ^ protejn whjch 

Lmesimilarity with the thyroglo^ 

consist of a N-terminal ubiquit.n-l.ke i protein , o 74 r ™ eS '; large fusion protein of 1132 residues that contains 
4S [6], a ubiquitin-like protein of 81 residues. - Human protein I ^ W P fusjQn otejn whjch consist 

l a N-terminal ubiquitin-like domain. - . DNA repair protein RAD23 
of a N-terminal ubiquitin-like protein of 70 res dues P si jficantly , re | a ted to ubiquitin. - Mammalian 
[8], RAD23 contains a N-terminal domain that seems to be ^.stant.y, yet g J Q ^ „ a protein 

LA D 23-re l ated P roteinsRAD 2 3AandRAD^ *^3KSS associated protein 114 (SAP 114 
so of 274 residues that contains a central ub.qurt.n-1 ike ctanm mm P M contajns a N . termin al 

or SF3A120). - Yeast protein DSK2, a prate «n '^[^S^ciSombe protein al P 11 and Caenorhabditis 
ubiquitin-like domain. - Human protein ^"P^***"^^ and a C-terminal CAP- 

elegans hypothetical protein F53F4.3. ^^^^S^Sl This protein contains a Nominal 
Gly domain. - Schizosaccharomyces pombe hypothetical P * in ^ h _ ubiquitin-like 

55 ubiquitin domain. - Yeast protein SMT3 ^Human * targeting ranGAPt 

protein SMT3C (a.so known as P C1; Ubl1 Su mo^ Gmp « Caenorhabdjtis e , egans . To identi{y 

to the nuclear pore complex protein ranBP2. - S ^^™L 0 P n conseiv ed positions in the central section of 
ubiquitin and related proteins, a pattern has been developed based on conserv p 
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Consensus pattern: K-x(2)-[LIVM]-x-[DESAK]-x(3) [LIVM] £A] M ) i i ^ Q R ^ q j Croke 
[1]JentschS.,SeufertW.,Hauser^ 

ki. Biotechnology 8:209-21 5(1 990).[ 3 w^l ,.G. J. Biol. Chem. 268:17967-17974(1993). 

M., Tribioli C Toniolo D. Genomics 7:463^(1990^ 95.393.399(1 993).[ 7] Jones D., Candido E. 

^bTc^ 

KSJ ^TprntvPS- 2 7,Hr S andSTAM.N Ur n b e r otrne mb e r s:27 

[1613] 711. Vinculin family signatures attachment of the actin-based microfilaments to the 

plasma membrane. Vinculinis located atthe ^^^ZT^L^mns. Vinculin is a large protein of 116 Kd 
vinculin interacts with other structural proteins such as tahn a ^^^ C domain of about 90 Kd separated 

(about a 1000 residues). Structura.lv -the ^^^^^0^ 50 residues. The central part of the 
lorn a basic C-terminal domain of ^^^^SK Caenorhabditis elegans) of repeats of a 110 
N-terminal domain consists of avanable number (3 cytoplasmic domain of avariety of cadhenns. 

amino acids domain. Catenins [2] i are P^ 1 ^^ H rS*Sto the actin filament network, and which 
The association of catemns to cadhenns prod uces " "n£«w Qj catenjns t0 

seems to be of primary importance for ^^S^SJioo Kd which are evolutionary related to vhcuhn 
■ exist: alpha, beta, and gamma. Alpha-cattnin s a e proteins of about enjn Q{ the repeated domain and 

M Kemter R. P.oc. Nail. tod. Sci. U.S.A. 88:9156-9160(1991). 
» [1*614] 712 .(V««»08snlnN)Upop^ 

M6181 714. ssDNA binding protein (Viral DNA bp) 

This protein is found in herpesviruses and is needed for replication. 

[1619] 715. (Votage CLC) ™ a ^^ helices. Each protein forms a single pore. It 

45 1 996;47:993-998. 

[16 21] 716. von Willebrand fac ^ m °^^^^^ with malaria thrombospondin-re.ated anonymous 

50 Bork P, Rohde K; 

Biochem J 1991;279:908-911. 

1. RUGGERI, Z.M. and WARE, J. 
von Willebrand factor. 
55 FASEB J. 7 308-316 (1993). 
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teins. 

MATRIX 13 297-306 (1993). 
dictions. 

J.MOLBIOL 238104-119(1994). 

4. BORK, P. and ROHDE, K. c pnuenC e similarities with malaria thrombospondin-related anony- 

BIOCHEM.J. 279 908-910 (1991). 

sheet flanked by alpha-helices found in human ras- P 21 . 
FEBS LETT. 358 283-286 (1995). 

CELL 80 631 -638 (1995). 
C^strt^ 

PROC.NATL.ACAD.SCI.USA 92 10277-10281 (1995). 
[1622] Thevc.W.ebrand^ 

Ltheletioiogy of ^ *s°rde^ ^ B c2 CR3 and CR4; th 

superfamily. The vWF domain ^. xl anS 5 V ^nd other extraoe.iu.ar proteins [2-4]- Proteinsthat incorporate 

integrins (l-domains); collagen types VI V I IXM and X.v^ ^ ^ fQr n d 

vWF domains participate ,n numerous bologjcalj events (e.g^c Secondary structure prediction from 75 

signal transduction), involving interact™ with a a ge array ^^f^,^ beta -strand S [3]. Fold rec- 
alled vWF sequences has revealed a lar ^^ o1 known structures: the vWF domain fold 
ognition algorithms were used to score ^J^SSS^t^ by alpha-helices [5]. 3D structures have been 
was predicted to be a <*W»^ CD11- (with bound manganese) 7 . 
determined for the l-doma,ns of integrins CDUb (wffl bouna g u ^ coordjnatjon srte at lts 

n6241 717 (WD40) WD domain, G-beta repeat 
. The ancient regulatory-protein fami^ of WD-repeat proteins. 
Neer EJ, Schmidt CJ, Nambudripad R, Smith TF; 

Nature 1 994,371 :297-300 a) of tne guan ine nucleotide-binding 

Beta-transducin (G-beta) is one of the three «" bu " rt * 

proteins (G proteins) which act as inter mec ^^^^ZSlJLi gamma subunits are less clear but 

5 ; n t;^ 

IS"' m higher euKaryotes G-beta exists as a sma., multigene fami, of highly consent proteins of about 340 
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GPAl (G-alpha) and STE18 (Glia™** hesis Ms „ ls most probably alsoa G-beta protein. 

. ra^n~" 

. SLnUS,. a neuronal prcain IrwoWed l ^, MOfc ,«,,.,„ c»mpK» that roversibly as- 
^,0004.—*—- o 1 DN A ,ap te ,,o„aad 5 apa,a«ono,^ S p«.po 1 e 6 o d ;as,o ( o m ^po,a S 
. CSSaP^in , eq *ad to -»om fcro , U buW.paa*a,p»ca K »: npciearn—spr-orto anaphase 
and chromosome separation. ron , if , a tinn of M1 double-stranded RNA. 

s " S lZ'asaion,-na* 98 .a^.and^n,o*..p«n^ 

. ^ap^G^^^as^r^-o.^ap^n^.edlnn.u^and 

SS-nsSo interact ««h the Notch and Delta proteins. 
. StSwbTAFHWO. a protein that is tightly assorted w.h TFIID. 
« u k fPRPd TUP1 and Groucho) and 8 (G-beta, 

M ™ - «pea, S epan ,h. entire length o, the 

STE4, MSI1, AAC3, CDC4, PWP1, etc.). In G beB M £ ioe P c-termtal section. 

40 . Consensus pattern: [ UVMSTAC H L,VMFYWSTAGC H L.MSTAG]- [ L.VMSTAGC]-x(2 H DN ] - 
x(2)- t LIVMWSTAC]-x-[LIVMFSTAG]-W-[DENHLIVMFSTAGCNl 

45 [ 11 Gilman AG. 

Annu Rev. Biochem. 56:615-649(1987). 

[ 2] Duronio R.J., Gordon J.I., Boguski M.S. 

Proteins 13:41-56(1992). 

[ 3] van der Voom L, Ploegh H.L. 
50 FEBS Lett. 307:131 134(1992). jr , e ., kT c 

[ 4] Neer E.J., Schmidt C.J.. Nambudripad R, Smith T.F. 

Nature 371:297-300(1994). 

[ 5] Smith T.F., Gaiatzes C.G., Saxena K., Neer E.J. 

Biochemistry In Press(1998). 



248 



EP 1 033 405 A2 



. Mammalian multifunctional «*«cyHR^ 

■ „■ „ ^ WHEP-TRS could contain o central alpha-nelical region and may play » role m 
T16291 This domain, which is called Wntr ino,w« 

r 21 Nada S„ Chang P.K., Dignam J.D. 
j. Biol. Chem. 268:7660-7667(1993). 

f1631l 71 9. (Worm family 8) Putative membrane protein 
Analysis of protein domain families in Caenorhabdit,s e.egans. 
Sonnhammer EL, Durbin R; 

r,S , Sr,afn 1 , 1 , m ayeeat— crane protein 
; The specific function of this protein is unknown. 

[1 632] 720. Xylose isomerase mic roorqanisms which catalyzes the interconversion of D- 

. Eir-«^^5!s— .a. 

S«« consent amino acids that Includes a ^^ti*.»Wiil.-t»«* 1 ' 

Consensus pattern: [LI]-E-P-K-P-x(2)-P 
[E is a magnesium ligand] 
[K is an active site residue] 
40 . consensus pattern: [FL]-H-D-x-D-[L.V]-x-[PD]-x-[GDE] 

[H is an active site residue] 

[ 1] Dauter Z., Dauter M., Hemker J., Witzel H„ Wilson K.S. 
FEBS Lett. 247:1-8(1989). Korhola M 

[ 2] Kristo P.A., Saarelainen R., Fagerstrom R., Aho S.. Korhola M. 
Eur J Biochem. 237:240-246(1996). 
[3]HenrickK.,CollyerC.A.,BlowD.M. 

" ^^e , ^Ke re ,e-..H M e re onH..Tem fl e,P. 
Biochem. J. 263:195-199(1989). 

. cm ns»\ Ml is a human autosomal recessive disease, 
[1635] 721. XPG protein signatures. Xeroderma P'^ 6 "^^ ^lo D e-s skTn cells with this condition are hypersen- 
ss harlctenzedbyahigh incidence^ There are a minimum of seven 

sitive to ultraviolet light, due to defect, in the "^^^ XP . G . The de fect in XP-G can be corrected by a 
genetic complementation j^^"^^ a family of proteins [2,3,4,5,6, that are composed 
1 33 Kd nuclear protein called XPG (or XKiaoj i-sj-at 
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, , , . ■ „ n=YPfi rad2 from budding yeast and radl3 from fission yeast. RAD2 
of twomain subsets: - Subset 1 , to which belongs XPG BAD2 from* 9 Y humgn ^ exc , slon 

and XPG are single-stranded DNA endonucl ^ n ^^ fission yeast, and RAD27 from budding 

repair [9]. - Subset 2, to which be ongs mouse , and huma FEN 1 ra ^ ^ ^ ^ ^ ^ 

yeast. FEN-1 * • -nicturwpecto 

also includes: - Fission yeast exol a 5 -> » ^'^™S^«*ably the same function as exol. - Yeast DIN7.Se- 
mismatched base pairs. - Yeast EXQ1 t° HS1 ^^^S, J e , arge ry confined to two regions. The first is 
quence alignment of this family of proteins & to 105 amino acids. The second region 

located at the N-terminal extremity (N-reg.onJj 140 residue s and contains a highly consen/ed 
is internal (l-region) and found towards the ( ^^^%^ m . H0B a. It is possible that the conserved 
core of 27 amino acids that includes a ep air in XPG The amino acids linking the N- 

[1636] Consensus pattern: [VI]-[KRE]-P-x [f-y L] v r u 1 1 [QS] . [C LM]- 

Consensus pattern: pSHUWflfH^ ^StS^iSS^i^ D., Nouspikel T, Corlet J„ Ucla 
[16 37] [ 11 K.. Wtaod RD ^^.S^S^?^?^ Sh'rtMck K.8.. Murray J.M., Al-Harithy R 
C, Bairoch A., Clarkson S.G. Nature 363.182^850^ d oa . ( . M R Shel . 

Watts F.Z., Lehmann A.R. Nucleic •^nTSiSrSeil 084M 5] Harrington J. J., Lieber M.R. 

, drick K.S., Lehmann A.R., Carr A.M Watts F.Z. Md C* B» 267;1166 . 116 9(1995).[ 7] Habraken Y, Sung P 
Genes Dev. 8 : i344-1355(l994U6]Szankas , F '^^^Xsc^y D., Clarkson S.G., Wood R.D. J. BjoL 
Prakash L, Prakash S. Nature 366:365-368(1 993).[ 8] J West s c ., Wood R.D. Nature 371: 
Chem. 269:15965-15968(1994).[ 9] O'Donovan A.. Davies A.A., Moggs 
432-435(1994). 

. Hypothetical protein ycdG from Escherichia col.. 
. Hypothetical protein ygfO from Escherichia co i. 
. Hypothetical protein ygfU from Eschenchia col.. 
35 - Hypothetical protein yicE from Escherichia co\>. 
. Hypothetical protein yunJ from Bacillus subtil®. 
- Hypothetical protein yunK from Bacillus subtilis. 

selected as a signature pattern. 

, „•„ WO Ceoch.no G soazzooohk.C.J.Btol.Ch e m.270:a610^622(199SV 

so shown to share regions of similarities. 
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. Haemophilus influenzae hypothetica. protein I HI0042. 

. Aquifex aeolicus hypothetical protein AO 1758. 

. Bacillus subtilis hypothetical protein yhcT. 

. Bacillus subtilis hypothetical protein yjbO. 

. Bacillus subtilis hypothetical protein ylyB- 

. Helicobacter pylori hypothetical protein HP0347. 

. Helicobacter pylori hypothetical protein HP0745. 

. Helicobacter pylori hypothetical ^'" ^095^ 

. Mycoplasma genitalium hypothetica pro e n MG209. 

. Mycoplasma genitalium hypothetical protein MG37 0 

. Schocystis strain PCC 6803 hypothetica pro e n s 592. 

. synechocystis strain PCC 6803 hypothetical protein slr1629. 

- Yeast hypothetical protein YDL036c. 
. Yeast hypothetical protein YGR169C. 

- Fission yeast hypothetical protein SpAC jBBIIJKfe 

. Caenorhabditis elegans hypothetical protein K07E8.7. 

■ , , rom 21 to 50 Kd whteh contain a number of conserved regions in their central section. 

S3 i=a&^ 

EC 4 2 1 1 70) [11 also have been shown to share reg.ons of s.m.lant.es. 
. E , heric h^ 

■ col! KoSca! pSn ymfc and HI0694, the corresponding Haemophilus influenzae protein. 

. Aquifex aeolicus hypothetical protein AQ_554 
. Aquifex aeolicus hypothetical protein AQ_1 464. 
. Bacillus subtilis hypothetical protein ypuL. 
; . Bacillus subtilis hypothetical protein ytzF. 

. Borrelia burgdorferi hypothetical protein BB0129. 
Helicobacter pylori hypothetical protein HPl4ba. 
Sy ec^ystis "strain PCC 6803 hypothetical protein slr036 . 
. Snechocystis strain PCC 6803 hypothetical protein slr0612. 

.ins of from 25 to 40 Kd which contain a number of conserved regions in their central sect,on. 



Consensus pattern: g-R-L-D-x^HSTA]^ 



uonsensua pcmom. ~ ■ ■ > . - 

ou- a Nurse K LaneBG Ofengand J. Biochemistry 34:8904-8913(1995). 
[1645] [1]WrzesinskiJ.,Bak,nA Nurse K Lane B^G u g 
[1646] 724. Zinc finger present in dystrophin, CBP/p300 
ZZ in dystrophin binds calmodulin 
Putative zinc finger; binding not yet shown. 

[16471 725. Zinc carboxypeptidase carboxypeptidases (EC 3.4.17,) [1,2]. All these enzymes 

«* « iTa pane,.* di 9 as,ive eazyma a spa*«, M <= «- °' ~*«- 
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,ocated in secretory granules of P^^^J cleavage during prohormone processing. 
It is ideally situated to act on pept.de hormones a local 

n^Swithspecificplasmamembranereceptors. but found in the 

vnase (EC 34 171), an enzyme with a specificity to carboxypept,dase A, but found 
Mast cell carboxypeptidase (t^ o.^- ■ ' ■ 1 >< ™ 1 

secretory granules of mast eel* [3] , which combines the specificities of mam- 

streDtomvces griseus carboxypeptidase (Opase o^j v 

of other transcriptional proteins. 
. Yeast hypothetical protein YHR1 32c. rtftho7inc 

" I „ Tan R, Chan S.J., Slein.r D.F , S**9 J*, Skid^l RA. 

, !'SSm. 264 20094.20099,1989,. 
l3]NatahashiV. 

J, Biochem. 107:879-886(1990^ st/okoovtov B., Kuratiova I., 

[5] HeG,P.,MuiseA.,LiA.W.,RoH,S. 

ISSiS^'S"- M., Vacheron M.,, Miche. G Denoroy L, 
45 ^X ^e* S„ Joris B., Weber G„ Ghuysen J.-M. 

Biochem. J. 292:563-570(1993). 
[71Rawlings N.D., Barrett A.J. 
Uth Enzymol. 248183-228(1995). 

r iraqi 726 Zinc finger, C2H2 type 
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X x 

X X 



X X 

C H 



X / \ x 
C H 

XXXXX xxxxx 



regi ons«oundin ^JJ^g^Si. may be present. 

data is ava.lable and that adrenal ^ ^ 

. Sa—esce— ^ 

: Il^SSSSSSSKSJ 

(36),ZNF133(3). wn I61 that a number of other positions are 
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r » « C xfSI fLIVMFYWC]-x(8)-H-x(3,5)-H [The two C's and two H"s are z.nc ligands] 
. Consensus pattern: C-x(2,4)-C-x(3)-[LIVMr two] xv 

r 11 Kluq A., Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 

[ 2] Evans R.M., Hollenberg S.M. 

Cell 52:1-3(1988). 

f 31 Payre F., Vincent A. 

FEBSLett. 234:245-250(1988). 

f 4] Miller J., McLachlan A.D., Klug A. 

EMBOJ. 4:1609-1614(1985). 

l pS».t a d.S0 1 .U.S.A.8 S :99-,0 2 „98eV 

[61 Rosenfeld R., Margalit H. 

J. Biomol. Struct. Dyn. 11:557-570(1993). 

[1654] 727. Zinc finger, C3HC4 type (RING ^ cvstei ne-ri C h domain ol 40 to 60 residues (called C3HC4 

— - — - — — are - 6 ~ ( ~ a,e "* prw,ded " 

5 recently determined sequences). 

4 , ono pifsii RAG1 activates the rearrangement of immu- 
. Mammalian V(D)J recombination activating protem (gene RAG1). 

noglobulin and T-cell receptor genes expression directed by the promoter region of the 

35 with ths Ro proteins. 

. Human histocompatibility locus prot ein RING 1 . translocation of PML with retinoic receptor alpha cre- 

. Human PML, a probable ^^^^^Ecyllc leukemia (APL). 

40 - Mammalian cbl proto-oncogene. 

45 " recognizes and bind a specific ^A sequence jnvolved in the biogenesis of 

. Mammalian peroxisome assembly actor- £A F ) (PMP3 g ^ ^ an autosomal re - 

««mvicnmp<; In humans, detects in rnr i » 15 f 
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. Baculoviruees protein CG30. ( pc-38> 
pLulovireses ma)or immediate early prolan PE W 

. pattern: 

116571 [ 11 Borden K.L.B., Freemont P.S. 
„ l C u, Opi, Struct. Bio.. B^O*^ type (and similar) . 

MamlyofCCHCzincfingers,^ 

Also contains members '^^^^^^es ol indels in the alignment. 
[1663] [1] LinnenJM, Bailey CP, weeKsu , 

[1664] 732. 14-3-3 proteins coordination of multiple signalling P 3 ^*,,^ 

KS AJ Tanner JW. Allen PM. Shaw AS; 
« Cell ,99ei*<»MW- ^ . ^ , s«e located towards the C-termmus. 

40 Wang W, Shakes DC 

J Mol Evoi 1996;43:384-398. 
Function ol 14-3-3 proteins. 

" f^CuSotpro^^ 
5S region located in the C-terminal sectton. 
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Tren^^nem. Sci. 20.95-97(1995). 
r 2] Morrison D. 

rssr «- ok. — . «■•»*• **" a - Gamw,n sj 

Nature 376:188-191(1995). 

[1670] 733. D-isomer specific 2-hydroxyacic ' D-.actate dehydrogenase .tarn*. in 

their substrate have been shown 1 1 ,*,o.**i »* 

of pyridoxins (vitamin B6). jcDH) _ a bacteria | enzyme that catalyzes the reversible 

vancomycin binding. , ninupri w 
. Escherichia coli hypothetical protein ycdW. 
. Escherichia coli hypothetical protein yiaE. 
. Haemophilus influenzae hypothetical protein HI1556. 
. Yeast hypothetical protein YER081 w. 

Yeast hypothetical protein YIL074W. 



[LIVMCMDNV] 



2 M » 5 ::^ k KH.W eS ,p h alAH,noKo k A.Ho,W 0; 
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11673] 735 SieBhydtoxysterioddehydrog.naseAsomerase family 

zrxsssEE^-" — 9ems in human and ral c,assica ' ana pe,ipn9,al SB " 

oidoqenic tissues. 

Labrie F, Simard J, Luu-The V, Pelletier Q, Be anger A 
Lachance Y, Zhao HF, Labrie C, Breton N, de Launoit Y, et al 

j steroid Biochem Mol Biol 1 992;41 :421 -435. komerase (3 beta-HSD) catalyzes the oxidation and 

[1674] 736. 3-hydroxyacyl-CoA dehydrogenase 

This family also includes lambda crystallin. 

Structure of L-3-hydroxyacyl-coenzyme A dehydrogenase. 

preliminary chain tracing at 2.8-A resolution. 

Soft JJ Holden HM. Hamlin R, Xuong NH, Banaszak LJ; 

Proc Natl Acad Sci USA 1987;84:8262-B266_ jnvo|ved jn acid meta bolism, 

[1675] S-hydroxyacyl-CoAdehydrogenaseJEC^^ 

t catalyzes the reduction of 3-hydroxyacyl-CoAto 3-oxoacyl CoA .Most 3 . h ydroxyacyl-CoA dehydrogenase 

ytems,onelocatedinmitochondriaandtheoth^ 

,o y rms, with enoyl-CoAhydratase (ECH) and ^^^^^^n the dehydrogenase activity. 
» terpart, multifunctional. 0 = f«.nHn«nn faoA^ HCDH is part of a multifunctional enzyme 

[1677] The other proteins structurally related to HCDH are. 

« ten 1 1 1 157^ which reduces 3-hydroxybutanoyl-CoA to ace- 
o . Bacterial 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1157) which 

. Ey^tns'protS-ambda-crysta.lin [4], which is specific to lagomorphes (such as rabbit), 
pattern has been derived from this central region. 



(2)-[GV] 



I , M il.»-H«.*»^».^--^'*»*'-* uw,, - ,, " 

AW". Hand* «. Blankeateijn WM, Bloemendal H.. da d» 9 ™ * « Cam 363: 



15462-15466(1988). 

f 16781 737 60s Acidic ribosomal protein 

Proteins P1 , P2. and P0, components of the eukaryotic 

ribosome stalk. New structural and functional aspects. 

Remacha M. Jimenez-Diaz A, Santos C, Briones E, Zambrano R, 

Rodriguez Gabriel MA, Guarinos E, Ballesta JP; 

Biochem Cell Biol 1995;73:959-968. 

This family includes archaebacterial L12, eukaryot.c P0, P1 and P2. 



This family includes arcnaeuduiei ._,«-, ,~ - 

[1679] 738. ^TTl" 8 * (6PGD) ca.alyze 5 IHa tikd stop h .ha haxoaa mcoophdsphata 
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111 A reqion which has been si 

of 6-phosphogluconate has been selected as a signature pattern. 
- Consensus pattern: [LIVM]-x-D-x(2)-lGA]-[NQS]-K-G-T-G-x-W 

[ 1] Reizer A., Deutscher J., Saier M.H. Jr., Reizer J. 
Mol. Microbiol. 5:1081-1089(1991). 

[ 2] Adams M.J., Archibald I.G., Bugg C.E., Came A., Cover S., 
Helliwell J R., Pickersgill R.W., White S.W. 
EMBO J. 2:1009-1014(1983). 



5-hydroxytryptamine (serotonin) 1 A to 1 F, 2A to 2C, 4, 5A, 5B, 6 and 7 [5]. 

Acetylcholine, muscarinic-type, M1 to M5. 

Adenosine A1 , A2A, A2B and A3 [6]. 

Adrenergic alpha-1 A to -1C; alpha-2A to -2D; beta-1 to -3 [7]. 

Angiotensin II types I and II. 

Bombesin subtypes 3 and 4. 

Bradykinin B1 and B2. 

c3a and C5a anaphylatoxin. 

Cannabinoid CB1 and CB2. 

Chemokines C-C CC-CKR-1 to CC-CKR-8. 

Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 

Cholecystokinin-A and cholecystokinin-B/gastrin. 

Dopamine D1 to D5 [8]. 

Endothelin ET-a and ET-b [9]. 

fMet-Leu-Phe (fMLP) (N-formyl peptide). 

Follicle stimulating hormone (FSH-R) [10]. 

Galanin. 

Gastrin-releasing peptide (GRP-R). 

- Gonadotropin-releasing hormone (GNRH-R). 
. Histamine H1 and H2 (gastric receptor I). 

- Lutropin-choriogonadotropic hormone (LSH-R) [10]. 

- Melanocortin MC1R to MC5R. 

- Melatonin. 

- Neuromedin B (NMB-R). 
. Neuromedin K (NK-3R). 

- Neuropeptide Y types 1 to 6. 
Neurotensin (NT-R). 

- Octopamine (tyramine), from insects. 

- Odorants [11]. 

- Opioids delta-, kappa- and mu-types [1 2]. 

- Oxytocin (OT-R). 

- Platelet activating factor (PAF-R). 
Prostacyclin. 

- Prostaglandin D2. 

- Prostaglandin E2, EP1 to EP4 subtypes. 
Prostaglandin F2. 

- Purinoreceptors (ATP) [1 3]. 

- Somatostatin types 1 to 5. 

- Substance-K (NK-2R). 

- Substance-P(NK-IR). 

- Thrombin. 

- Thromboxane A2. 
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Thyrotropin (TSH-R) [10] 
Thyrotropin releasing factor (TRH-n). 

• in- IHk »H \/0 



I nyiviiw, " 

Vasopressin Via, V1b and V2. 
Visual pigments (opsins and rhodopsin) [14]. 

Proto-oncogene mas. known) from mammals and birds. 

A number of orphan receptors ^^t^^C, QX C43C3.2, T27D1 .3 and ZC84.4. 

Caenorhabdrtis elegans P^^^^^^,^ US27, US28, and UL33. 

l1682l ThestructureofalUhesereceptors^obe 

which most probably spans the membrane. J ^™™^^ ph0 s P hory.ated. Three extracellular loops 
is often Glycosylated, while the C-termmus is cytoplasmic and genera y p h a|| Qf these receptorSi 

'±Z===~ — 

[UVMFYWSTAC]-[DENH]-R-[FYWCSH]-x(2)-[LIVrVI] 



r ilStrosbergA.D. 

Eur. J. Biochem. 196:1-10(1991). 

[ 2] Kerlavage A.R. 

DNA Cell Biol. 11:1-20(1992). 
[ 4] Savarese T.M., Fraser CM. 
Biochem. J. 283:1-9(1992). 
[5] BranchekT. 
Curr. Biol. 3:315-317(1993). 
[6] Stiles G.L 

j.Biol.Chem.267:6451-6454(1992) o 

[ 71 Friell T, Kobilka B.K., Lelkowitz R.J.. Caron M.G. 

Trends Neurosci. 11:321-324(1988). 

[8] Stevens C.F. 

Curr. Biol. 1:20-22(1991). 

[ 91 Sakurai T, Yanagisawa M.. MasakT T. 

Trends Pharmacol. Sci. 13:103-107(1992). 

[lo, Salesse R., Remy J.J.. Levin J.M., Jal.a. B., Gam.er J. 

Biochimie 73:109-120(1991). 

[11] Lancet D., Ben-Arie N. 

Curr. Biol. 3:668-674(1993). 

[12] Uhl G.R., Childers S., Pasternak G. 

Trends Neurosci. 17:89-93(1994). 

[131 Barnard E.A., Burnstock G., Webb T.E. 

Trends Pharmacol. Sci. 15:67-70(1994). 

[141 Applebury M.L., Hargrave P.A. 

Vision Res. 26.1881-1895(1986V 

[15] Attwood T.K., Eliopoulos E.E., F.ndlay J.B.C. 
Gene 98:153-159(1991). 
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more color pigment (lor example, h mamrrels. red and gjeonV OTma „ a|um Mnlains eignl phol0 ,e- 

R7 and R8) expresses a specific opsin. r «,i nn rhrome also known as retinal photoisomerase, 

So" 

[ransmembrane helix. The pattern that had been developed includes th.s residue. 

. Consensus pattern: [L ,VMWACHPGC]-x(3)- t SAC,-K- l STAL, M R]-[GSACPNV]- [ STACP]-x(2)-[DENF ] -[AP]-x(2)- 
[IY] 

[K is the retinal binding site] 

[ 1] Applebury M L., Hargrave P A. 
Vision Res. 26:1881-1895(1986). 
[ 2] Fryxell K.J., Meyerowitz E.M. 
J. Mol. Evol. 33:367-378(1991). 

[ 3] Shen D., Jiang M., Hao W., Tao L, Salazar M., Fong H.K.W. 

Biochemistry 33: 1 31 1 7-1 31 25(1 994). 
[16 89] The following descriptions of protein family functions are not provided by the Pfam or Prosite databases. 
[1690] 740. BAH 

BAH domain. Number of members: 65 

' Kdon,ai„ E Th M , 2 ELM2 (Egl-27 and MTA1 homo.og, 2, domain is a ems. domain o, U nKno»m (uncaon. Number o, 
a baT^-i~ 

[LIVMY] 

[ 1] Benz R. Biochim. Biophys. Acta 1197:167-196(1994). 
so [ 2] Manella C.A. Trends Biochem. Sci. 17:315-320(1992). 

[ 3] Dihanich M. Experientia 46:146-153(1990). ,cn f1Q( m 
4 Forte M.. Guy H.R., Mannel.a C.A. J. Bioenerg Biomembr. 19 ^^°^ ^ 
[ 5] Sampson M.J., Lovell R.S., Davison D.B., Cra.gen W.J. Genomics 36.192-196(1996). 

55 [1693] 743. Glycohydor19 
Chitinases family 1 9 signatures 
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licatton ol glycosyl hydrolasas I2.E11. Ch.inaees of faml, 19 (a = 5 * ^ * chaining call 

..=™ ^.etnat function. IntMda^ 

5ci3=t^^ 

[ llFlach J , Pilet P.-E., Jolles P. Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[1695] 744. MBD 

Methyl-CpG binding domain ...... or more symmetrically methylated CpGs [1]. 

The Methyl-CpG binding domain (MBD) b,nds to DNA that conta ns^ one mor * ^ sion . 
□NA mention in animals P-tecta 12 

DNA demethylase [2]. Number of members: 11 

mMed.ine 9423281 3. Dissection of the methyl-CpG binding domain from the chromosomal protein MeCP2. Nan 

; ::»-o t ^ m c, D ,^sK, 

Ramchandani S, Cervoni N, Szyf M; Nature 1999;397:579-583. 

[1696] 745. Peptidase C1 
j Eukaryotic thiol (cysteine) proteases actK/e sites 

cross-references) THIOL_PROTEASE_CYS; THIOL_PROTEASE_HIS, 

e!°S^^ 

(references ate only provided lor recently determined eequenees). 

. Venebrate tyeosoma, cathepsins B (EC 3.4.22.,). H (EC Mat* L (EC 3^151 and S (EC 3.4.22.27, [2|. 

- Human cathepsin O [41 cata ivzes the inactivation of the antitumor drug BLM (a glycopeptide). 

. Bleomycin hydrolase. An ^7i f^^EP-lvZ kidney bean EP-C1, rice bean SH-EP; kiwi fruit actinid.n 
45 - Plant enzymes: barley aleuram (EC 3.4.22. i b) tr d im, «« y caricain (EC 3 4 22.30), and pro- 

andRD2lA. 

so - House-dust mites allergens DerP1 and EurMT gnd 

AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

- Slime mold cysteine proteinases CP1 and CP2. 

55 : ? h ^ 0 ^ 

. Proteases from Leishmania mexicana, Theileria annulata and The.lena parva. 

- Baculoviruses cathepsin-like Enzyme (v-cath). 
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. crosophifa small op* l*.s protein «* « «— M K "' a ' n5 " C< " PaMke d ° maln 

[1697] Two bacterial peptidases are also part of this family: 

. Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
. Thiol protease tpr from Porphyromonas gmgivalis. 

<PDOC00382>). , d sta antigen . Th is protein of 111 Kd pos- 

'CS'^IZ » lam,.» C. (papa^P.) -d 02 (clpains, In M classification =1 pep«a.es 
[1]Du(OU.E.Blpchinnle70:1335-1342(19e8) n , B43 „ 99 6 , 

[7]RavylingsN.D.. Barren A. J. Math. Enzymol. 244.461-486(19941. 
[170O] 746. Peptidase M22 vrnpROTEASE 

Glycoprotease family signature cross-reference(s) OLYCOPHU I woe meatoprotease secreted by Pas- 

highly similar to the following uncharactenzed proteins. 

- Escherichia coli hypothetical protein ygjD (ORF-X). 
45 - Bacillus subtilis hypothetical protein ydiE. 

- Mycobacterium leprae hypothetical protein U229E 

- Mycobacterium tuberculosis hypothetical protein MtCYmiO. 

- Synechocystis strain PCC 6803 hypothetical protein slr0807. 

- Methanococcusjannaschii hypothetical protein MJ11 30. 

so - Haloarcula marismortui hypothetical protein in HSH 3 region. 

- Yeast hypothetical protein YKR038C. 

- Yeast hypothetical protein QRI7. 

tnon One 04 me curved regies «. conserved h««es. I, is possible «, this region * ,nv*ed in 

Note these proteins belong to family NI22 in the class,f,cat,on of peptidases 12.EU 
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l iiAbdullah KM, Lo R.Y.C.. MeUors A J. 

l2)Rawlings N,D„ Barrel! A.J. Math. Enzymol 248,83.228(1995). 

CP, Holmarm K, Bork P; Pret Soi |9S7*24MM. . mecnomS re lo. modular 

'ZSTZre^rS^^ 

[17M1 ^rea^ureac^^ 

~rrz,r.s 

^TSS p«— re,aed ,o ars known '° 9Xis ' in mammals: 

. TRP-i (TYRP1) [8], **h is response. .o„ha — n o, 5,8*,drexy,ndo,a*.ca*oxy,* acid ,DH,CA, ,o 
indole-5 6-quinone-2-carboxylic acid. mo ,o„tnmerase (EC 5.3.3.12) that catalyzes 

j instead ol copper [7]. 

[1707] Other proteins that belong to this family are: 

. PMS polyp.aao, o*«asas ,PPO, (EC ,,0.3,, — «*» * - - " * 

5 diquinones [8]. . „ . 

- Caenorhabditis elegans hypothetical prote.n C02C2. 1 . 

1)7M1 Tyreaionarerapanaresrerlyroaiaaa.^ 

[ ULareh K. Proa, Clin. Biol. Roa 256:85-98(1 "*>■ 
1 2 Jackrean MP.. Hajoal A., Larch K. e ^ZiS, Y 
3 Linzan B. Narerwiaaanschanaa 76^-211(1989). gg,, 

ISSN'S*— ' • ™ J c ■ Lozan ° J A B, " tem ' Biophys 

^ n ^a™P« - 
ent addition of glutamate moieties to tetrahydrofolate. 
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catalytical activity and/or in substrate binding. 

, c^pr*^ 

to these families, 
is [1714] Family M1 

- Bacterial aminopeptidase N (EC 3.4.11 .2) (gene pepN). 

: ESSSSW N** A,. „ n,a y p,a y a ,a,e in re g „,a,,n 9 9 ro* 

20 and differentiation of early B-lineage cells. 

- Yeast aminopeptidase yscll (gene APE2). 

- Yeast alanine/arginine aminopeptidase (gene AAP1 ). 

' rlSr^mS). Th* snzy™ is -aspdnsfcl. -or ,he hyd^ia d - apoxida moiaty o, 
t?M Z Smrn? « has baan sho»n .hat it hinds zinc and is capable of papndasa ac.,v„y. 

[1715] Family M2 

„™m Q ifc q 4 1 5 1 1 (diDeptidyl carboxypeptidase I) (ACE) the enzyme responsible for 
isozyme which has two active centers. 
[1716] Family M3 

3 s - Thimet oligopeptidase (EC 3.4.24.15), a mammalian enzyme involved in the cytoplasmic degradation of small 
SroTvsin (EC 3 4 24 16) (also known as mitochondrial oligopeptidase M or microsomal endopeptidasey 

of some proteins imported in the mitochondrion. 
40 - Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

- Escherichia coli and related bacteria dipeptidyl ^S^^^S^M 
. Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or prIC). 

- Yeast hypothetical protein YKL134C. 

45 [1717] Family M4 

- Thermostable thermolysins (EC 3.4.24.27), and related thermolabile neutral proteases (bacillolysins) (EC 
3 4 24 28) from various species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 
so - Extracellular elastase from Staphylococcus epidermidis. 

- Extracellular protease prt1 from Erwinia carotovora. 

- Extracellular minor protease smp from Serratia marcescens. 

- Vibriolysin (EC 3.4.24.25) from various species of Vibrio. 

- Protease prtA from Listeria monocytogenes. 

55 - Extracellular proteinase proA from Legionella pneumophila. 

[1718] Family M5 
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. Myoolysin (EC 3.4.24.31) from Streptomyces cacao,. 

[1719 ] FamilyMB lnadeQrades two classes of insect antibacterial proteins, 

attacins and cecropins. 
[1720] Family M7 

. streptomyces extracellular small neutral proteases 

P791] FamilyMS acell surface protease from various species otLeish- 

. Uis hmano,ysin(EC,4,4.36^ 
mania. 

[1722] Family M9 

o 

[1723] Family M10A 
. Yeast hypothetical protein YlUObw. 

[1724] Family M10B i (EC 3 4 24.7) (interstitial col- 

* • |§^^^ 

. soybean metalloendoprotemase 1 . 

[1725] Family M11 
40 . Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 

[1726] Family M12A 

so purpuratus. 

egT^raTellular matrix, at the time o. hatching. 
[1727] Family M12B 
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trimerelysin I (EC 3.4.25.52) and II (EC 3.4.25.53). 

- Mouse cell surface antigen MS2. 

5 

[1728] Family M1 3 

" - SSw *cop,o.»in, a maio, antigen, p,o,ain o, e^rocytaa. Tha K.„ p,o,a,n is »a„ probai* a zinc 
endopeptidase. 

- Peptidase O from Lactococcus lactis (gene pepO). 

J5 [1729] Family M27 

brevins, syntaxin and SNAP-25 [7,8]. 

20 

[1730] Family M30 

- Staphylococcus hyicus neutral metalioprotease. 
25 [1731] Family M32 

- Thermostable carboxylase 1 (EC 3.4.17.19) (carboxylase Taq), an enzyme from Thermus aouaticus 
which is most active at high temperature. 

30 [1732] Family M34 

- Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the anthrax toxin. 

[1733] Family M35 

35 - Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various species of Aspergillus. 

[1734] Family M36 
40 - Extracellular elastinolytic metalloproteinases from Aspergillus. 
M7351 Fromthetertiarystructureoftten^^ 

glutamic acid residues is sufficient to detect this superfamily of proteins. 

SnT^S 
two H's are zinc ligands] [E is the active site residue] 
so Sequences known to belong to this class detected by the pattemALL, 
except for members of families M5, M7 amd M11 . 
Other sequence(s) detected in SWISS-PROT55; including Neurospora 
crassa conidiation-specific protein 1 3 which could be a 
zinc-protease. 

55 

[ 1]Jongeneel C.V., Bouvier J., Bairoch A. 

FEBS Lett. 242:211-214(1989). 

[ 2]Murphy G.J.P., Murphy G„ Reynolds J.J. 
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FEBS Lett. 289:4-7(1991). 

[ 3]Bode W., Grams R, Reinemer P., Gomis-Rueth F.-X., Baumann U., McKay 

D.B., Stoecker W. 

Zoology 99:237-246(1996). 
s [ 4]Rawlings N.D., Barrett A.J. 

Math. Enzymol. 248:183-228(1995). 

[ 5]Woessner J. Jr. 

FASEBJ. 5:2145-2154(1991). 

[ 6]Hite LA., Fox J.W., Bjarnason J.B. 
w [ 7]Montecucco C., Schiavo G. 

Trends Biochem. Sci. 18:324-327(1993). 

[ 8]Niemann H., Blasi J., Jahn R. 

Trends Cell Biol. 4:179-185(1994). 



eoside found in all cellular RNAs. The TruA-.ike proteins also exhibrt a conserved sequence w,th a stnctly conserved 

one atom o! zinc essential for its native conformation and tRNA recogn,t,on. Arluison V, Hountondj, C, Robert B, Gros 
jean H; Biochemistry 1998;37:7268-7276. 

thesis from chor ismate of the aromatic amino acids (the shikimate pathway) in bactena (gene aroA) plants and f ung 
vZre it is oart of a multifunctional enzyme which catalyzes five consecutive steps in th,s pathway) [1]. EPSP synthase 

S iuem T» Ced in .he C-.ern* al par. o, .ha p,o.ai„ and cent*, a conaan,ed l»s,na wh,ch saama to ba 
important for the activity of the enzyme. 

[1741] Description of pattern(s) and/or profile(s) .„ TA1 

f llStallings W.C., Abde.-Megid S.S., Lim L.W., Shieh H,S., Dayringer H E o Lei ^9 ru & be ^ S ^ 

R.T., Kishore G.M. J. Biol. Chem. 266:22364-22369(1991). 
[1743] 753. Glyco_hydro_18 
^ne^5^^ 

penheim AB, Chet I, Wilson KS, Vorgias CE; Structure 1994;2:1169-1180. 
[1744] 754. Esterase 

Stm^cSains Esterase D Swiss: P1 0768. However i, is not clear if all members of the family have the same 
function. This family is possibly related to the COesterase family. 
> Number of members'. 36 

ToLvetd^ 

deS ^"avy metals. This domain contains two conserved cysteines that could be involved ,n the binding of these 
mSs The domain has been termed Heavy-Metal-Associated (HMA). It has been found in. 

5 - Avarietyofcation^ 

1 H flT p 7B which ar e respectively involved in Menke's and Wilson's diseases. ATP7A and ATP7B both contain b 
Lndem co P r e s of 2 HMA domain. The copper ATPases CCC2 from budding yeast, copA from Enterococcus 
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faecalis and synA from Synechococcus contain one copy of the HMA domain. The cadmium ATPases cadA from 
S fill andTl plasmid p,258 from Staphylococcus aureus also contain a single HMA doma-n - a 
chromosomal Staphylococcus aureus cadA contains two copies. Other, less charactenzed ATPases that contain 
Z$££*«* are fix. from Rhizobium meliloti, P acS from Synechococcus strain PCC 7942) Mycobac tenum 
leprae ctpAaTd ctpB and Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA doma,n(s) are 

' TZ e a ^ive baiei. Mercuric reductase is a class-1 pyridine nucleotide-disu.phide ^reductase (see 
<PDOC00073>). There is general^ one HMA domain (with the exception of a chromosomal merA from Bacillus 
strain RC607 which has two) in the N-terminal part of merA. 

- mScuNc fransTort protein periplasm, component (gene merP), also encoded ^TZT^eH^Z 
resistant Gram negative bacteria. It seems to be a mercury scavenger that specific* ly binds to one I Htfb) «n 
and which passes it to the mercuric reductase via the merT protein. The N-termmal half of merP is a HMA domain. 
Helicobacter pylori copper-binding protein copP. 

- Yeast protein ATX1 [2}, which could act in the transport and/or partitioning of copper. 

[1746] The consensus pattern for HMA spans the complete domain. 

two C's probably bind metals] 

f 11BUII PC , Cox D.W. Trends Genet. 10:246-252(1994). 

[2]Lin S.-J., Culotta V.L. Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

[1748] 756 (Peptidase M10) Matrixins cysteine switch 

M— SSS£SS2 (EC 3.4.24,), also k nown as matrixins [1] (see <PDOC00129», 
alrz^epe^ent enzymes. They are secreted by cells in an inactive form (zymogen) that defers from the mature 
"zym^ 

of Tc-terminal end of the propeptide. This region has been shown to be invoked in automh.bition of matrixins [2^ 
a cysteine within the octapeptide chelates the active site zinc ion, thus inhibiting the enzyme. This reg.on has been 
called the 'cysteine switch' or 'autoinhibitor region'. 
A cysteine switch has been found in the following zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1 ). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-1 0 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24,) (stromelysin-3). 

- MMP-1 2 (EC 3.4.24.65) (macrophage metalloelastase). 

- MMP-1 3 (EC 3.4.24,) (collagenase 3). 

- MMP-14 (EC 3.4.24,) (membrane-type matrix metalliprotemase 1 ). 

- MMP-15 (EC 3.4.24,) (membrane-type matrix metalliproteinase 2). 

- MMP-16 (EC 3.4.24,) (membrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4], 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5], 

[1749] Description of pattern(s) and/or profile(s) 

Consensus patternP-R-C-[GN]-x-P-[DR]-[LIVSAPKQ] [C chelates the zinc ion] 

[ 1]Woessner J. Jr. FASEB J. 5:2145-2154(1991). , _ i d* i rh.m or-vhrq? 11899 

[ 2]Sanchez-Lopez R., Nicholson R., Gesnel M.C., Matrisian L.M., Breathnach R. J. Biol. Chem. 263.11892-11899 

\ 3]Palk A.J., Matrisian L.M., Ke.ls A.F., Pearson R, Yuan Z„ Navre M. J. Biol. Chem. 266:1584-1590(1991). 
[ 4]Lepage T, Gache C. EMBO J. 9:3003-3012(1 990). 
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Subtilases[1,2]are an extensive ^ ^ 
from that d the analogous resiauu 

category of proteases. following proteases: 

Extracellular protease rom various Bacillus, 
intracellular serine 9^™°^™ l om Bacillus subtilis (gene epr . 
Min or extracellular senne protease ep ^ r) . 

Minor extracellular serine P^ otea ^ njgP from Lac tococcus lact.s. 
Nisin leader peptide ^T^^ haemolytica (gene ssal). 
Serotype-specific : anUgene ^^^^es vulgaris. 
Thermitase (EC 3.4.21 .66 ,)Uom i n ™ (gene prcA). 

Calcium-dependent V° Xe *^£% p , (gene hly). 
Halolysin from haloph.Hc tartar a ap. 7 P U & (gene xpr2) . 

AlKaline .P^^^^p^^f^yea-^WBI). 
Cerevism (EC 3.4.21 -«)^ * Metarhizium anisopliae. 
. cuticle-degrading protease WW™" 

. Proteinase T from ™«racrHum t ( ene YSP3). 

. subtilisin-like protease III rom \ ^ btandnea su^rea. _ nd PACE 4 protease from mammals, other 

. prestaik-specific proteins tagB a^agC^ ftBC transporter domain (see <PDOC00185». 
subtilase catalytic domain and a C termma 

Note if a protein includes at least wo 

from the subtilase family is 100% classiflca tion of peptidases I5.E1]. 

Note these proteins belong to fam.WS8.nt 

.am DiikstraBW. Protein Eng. 4.71 9-/ ' 
[1]S iezenB. J .,deVosW.M.,Leun,ssen J .A.M.,D, jk straB 
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[ 2]Siezen R.J. (In) Proceeding subtilisin symposium, Hamburg, (1992). 
[3]BarrP.J. Cell 66:1-3(1991). 

[4]Shaulsky G., Kuspa A , Loomis W.F.; Genes Dev. 9:1111-1122(1995). 
[5]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

s 

[1752] 758. (SSB) Single-strand binding protein family signatures 
PROSITE cross-reference(s): PS0Q735; SSB_1,PS00736; SSB_2 

The Escherichia coli single-strand binding protein [1] (gene ssb), also known as the helix-destabilizing protein, is a 
protein of 1 77 amino acids. It binds tightly, as a homotetramer, to single-stranded DNA (ss-DNA) and plays an important 

io role in DNA replication, recombination and repair. 

[1753] Closely related variants of SSB are encoded in the genome of a variety of large self-transmissible plasmids. 
SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. 
[1 754] Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved in mitochondrial DNA replication 
are structurally and evolutionary related to prokaryotic SSB. Proteins currently known to belong to this subfamily are 

is listed below [2]. 

Mammalian protein Mt-SSB (P1 6). 

- Xenopus Mt-SSBs and Mt-SSBr. 

- Drosophila MtSSB. 
20 - Yeast protein RIM1. 

[1755] Two signature patterns have been developed for these proteins. The first is a conserved region in the N- 
terminal section of the SSB's. The second is a centrally located region which, in Escherichia coli SSB, is known to be 
involved in the binding of DNA. 
25 [1756] Description of pattern(s) and/or profile(s) 

Consensus pattern[LIVMF]-[NST]-[KRT]-[LIVM]-x-[LIVMF](2)-G-[NHRK]-[LIVIv1]-[GST]-x-[DET] 
Consensus patternT-x-W-[HY]-[RNS]-[LIVM]-x-[LIVMF]-[FY]-[NGKR] 

[ 1]Meyer R.R., Laine PS. Microbiol. Rev. 54:342-380(1990). 
so [ 2]Stroumbakis N.D., Li Z., Tolias P.P. Gene 143:171-177(1994). 

[1757] 759. KDPG and KHG aldolases active site signatures 

PROSITE cross-reference(s): PS00159; ALDOLASE_KDPG_KHG_1 , PS00160; ALDOLASE_KDPG_KHG_2 
[1758] 4-hydroxy-2-oxoglutarate aldolase (EC 4.1 .3.16) (KHG-aldolase) catalyzes the interconversion of 4-hydroxy- 
35 2-oxoglutarate into pyruvate and glyoxylate. Phospho-2-dehydro-3-deoxygluconate aldolase (EC 4.1.2.14) (KDPG- 
aldolase) catalyzes the interconversion of 6-phospho-2-dehydro-3-deoxy-D-gluconate into pyruvate and glyceralde- 
hyde 3-phosphate. 

[1759] These two enzymes are structurally and functionally related [1]. They are both homotrimeric proteins of ap- 
proximately 220 amino-acid residues. They are class I aldolases whose catalytic mechanism involves the formation 
40 of a Schiff-base intermediate between the substrate and the epsilon-amino group of a lysine residue. In both enzymes, 
an arginine is required for catalytic activity. 

[1760] Two signature patterns were developed for these enzymes. The first one contains the active site arginine and 

the second, the lysine involved in the Schiff-base formation. 

[1 761] Description of pattern(s) and/or profile(s) 
45 Consensus patternG-[LIVM]-x(3)-E-[LIV]-T-[LF]-R [R is the active site residue] Consensus patternG-x(3)-[LIVMF]-K- 

[LF]-F-P-[SA]-x(3)-G [K is involved in Schiff-base formation] 

[1762] [ 1] Vlahos C J., Dekker E.E. J. Biol. Chem. 263:11683-11691(1988). 

[1763] 760. AP endonucleases family 1 signatures. PROSITE cross-reference(s): PS00726; 

AP_NUCLEASE_F1_1, PS00727; AP_NUCLEASE_F1_2, PS00728; 
so AP_NUCLEASE_F1_3 

[1764] DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 
oxygen radicals produce a variety of lesions in DNA. Amongst these is base-loss which forms apurinic/apyrimidinic 
(AP) sites or strand breaks with atypical 3'termini. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blocking 
55 groups from the 31erminus of DNA strand breaks. 

[1765] AP endonucleases can be classified into two families on the basis of sequence similarity. Family 1 groups 
the enzymes listed below [1 ]. 
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- Escherichia ooU exonuclease III (EC 3.1 .11 .2) (gene xthA). 

- Streptococcus pneumoniae and Bacillus subtilis exonuclease A (gene exoA). 

- Mammalian AP endonuclease 1 (AP1) (EC 4.2.99.18) 

- Drosophila recombination repair protein 1 (gene Rr P 1). 

- Arabidopsis thaliana apurinic endonuclease-redox protein (gene arp). 

contain addilional and anralatad saqaanceajn Ihea N«™,na, ^ ^ m Msed „ „, most 

L S Lu SP a 1Mm D 1S Tl.[FV>R.lKHl-x(7,SHFYWSTHFVWK2) 
Consan 6U spaUemN.X-G.X-R-[UVMl-D.[LIVMFYH|.x-(LVl.x^ 

SEE ^aCa'sSTETna a pn=,ain 0. aboa, 32 Kd »hiah * at,u«* rafc,sd ,o ft. baCaria! n«,ogan 
[1777] Other related proteins are: 

- Escherichia coli hypothetical protein ydiR. 

- Escherichia coli hypothetical protein ygcQ. 

[17 78] A highly conserved region which is located in the Cerminal section was selected as a signature pattern for 

3 ^77^™ 

[ 1] Finocchiaro G., Ikeda Y, Ito M., Tanaka K. Prog. Clia Btt Res. 321 :637-652(1990). 
[2]Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

[1780] 763. (lectin c) ^^^^^P^W,; C TYPE_LECTIN_2 

PKOSITE ''^^^^^Z^^^^^ domain which was first characterized in some 

bonds. A schematic representation of the CTL domain is shown below. 
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xcxxxxcxxxxxxx< 
10 xxxCx 



CxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxWxCx 



********!* 



•C: conserved cysteine involved in a disulfide bond. 
V: optional cysteine involved in a disulfide bond. 
'*': position of the pattern. 



. Aaia^oopr^reoeptc^^^ 

regulation of IgE preoption and in the •'—r^ tnal coula be involved in ankylosis. 
. Kuptt.r eel, reoeptor. A ^^""^"^T^S: NKG2, NKR-P1, YEt/88 (Ly-49). CD69 

• ro^r^:^ 

unclear whether they are likely to bind carbohydrates. 

(17MI p^-^-.^-^^^*'^^*^''^" 
sometimes called 'collecting. 

* ■ A/cp a\ cp A is a calcium-dependent protein that binds to surfactant 

nlexes throuqh the complement component (iC3b). 
. Enan Sing proteins (MBP) (also known as 

MBP's bind mannose and N-acetyl-D-glucosamme ,n a calcium-dependent 

manner. 

o - Bovine collectin-43 (CL-43). 

l17KI Sa,e a «a ( dtLHO.CA T nSe r== ^ 
w«hp,a,,*o, vascular — 

. Lymph node homing receptor (atso known as ^lectin, leukocyte adheaion 
molecale-1, (LAM-1), leu-B, gp90-mel, or LECAM-1) 
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. Endothelial leukocyte adhesion molecule 1 (ELAM-1, E-se,ectin or LECAM-2). 
[1786] The ligand recognized by ELAM-1 is sialyl-Lewis x. 

. sranule me^rane protein 140 (GMP-140, P-selectin, PADGEM, CD62, or LECAM- 
3) The ligand recognized by GMP-140 is Lewis x. 

terminal section: 

- Brevican. 
signalling. 

V-type region, two or four link domains (see <PDOC00955» up 

[1789] Two type-l membrane proteins: 

. Man,» S e,ec.p.or 1 rommac,c,hagea^^^ 

; . 1 80 Kd secretory phospholipase A2 receptor Vu~) h 

structure is highly similar to that of the mannose , recepwr ^ {q ^ endocytose 

domains (see <PDOC00988», a CTL <PDO C00912>). 
domain, 2 VWFC domains (see <PDOC00928), ana a o ^rv l 

[1790] Various other proteins that uniquely consist of a CTL domain: 

as an inhibitor of spontaneous calcium ca *^" , mtad in me control ol bacterial proliferation. 

". SpM granuia map. basic prolain (MBP). » c*oto« protetn. 

* ■ rarroia^r— 

so cockroach [8]. irm[ .. 

. Sea raven antifreeze protein (AFP) [9J. 

lf«a W^rllantallon antr, la linker) to both a signature pattern and a profile. A. Iha profile is much more saneitive 



273 



EP 1 033 405 A2 

^^^^^^^^^ 

r 21 Ffeeman M„ AshKenas a.. nHB 
Acad Sci. USA 87:8810-8814(1990) 

from Nmeningitidia and N.gono ^ 0l HMTOphi , u s _ 

40 3161-3167. 

vfiR RRCA1 C Terminus (BRCT) domain checkpoint functions responsive to DNA 

associate as homo- or heterooim 

[5] Medline: 99016060. Structure of an XRCC1 BHU 
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m «i9i 768 Chitinases family 1 8 active site 



a) Chitinases from: 

. Nematode (Brugia malayi). 



b) Other proteins: 



. KluyvMomycM lactb kilte Mr. alpha ' s **"' ^uucosamlnidas^ (EC3.2196). 
, „ Plach J PM P-E , Jo»ea P- W*« 48:701-716(1992). 

„,„. M ass,ah JJUortd. phosphatases sigh»,s 
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[18101 Consensus P attemILIVM]-x-u. x W 

[18 ,„ m aTPCC^-U^- YDRO L _,_,. GTP.OVCLWYO^^^C^S 
Mt step in the biosynthesis of tetrahydrofolate ^ 

bindin9 « mIDE Nl-[LIVM](2)-x(2)-[KRNQ)-[DEN]-[LIVMl-x(3)-lSTl-x-C-E- H-H 

.^..^.a— ^ 

..action is the second M» synthetic pathway ..reductase complexed with 

" . M a j ot M .e,n,e mW ane,W-.aih(n n »^WO,eins, (8 .n. W ». 

SLlchia coll W ot.in-28 (gene nlpA . 
. iS.richiaco»»poO'ot.ih-34( 9 .n.nlpB) 
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. Escherichia coli lipoprotein nlpC. 

. Escherichia coli lipoprote.n nlpD. g (gene osmB). 

. Escherichia coli osmotica* .ndu b e I popr^e " B g 

. Escherichia coli "^^SSfflSS^" (0- P al >' 
s . Escherichia col, peptide £ ( ^ ^ an d rplB). 

J0 . A number of Bacillus beXa ^ cX ™*t^ deM na protein (gene oppA). 

2S . Rickettsia 17 Kd ant, 9f o n n otesmid protein s mxiJ and mxiM. 
. Shigellaflexneri— 
. streptococcus pneumoniae oiiaup h 

-. -:iS=^— ■ 

. . -. .u» n hiaca inane chD). 



Treponema yam^.- - • 

Vibrio harveyi chitobiase (gene chb). 

archa ebacteria,prote,nKnowntobemod ^ ^ a _ patte m and a set ot rules to 

(1994). 

PROSITEcro8s-felerence(s); TRNA-^^^ - g ~ jn0 ac i(js and ,r ^ s '^ r ^g^j^pes o(^mirtoacyl-tRNA 
the class I synthetases [7]. 
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the active site residue] 
11831) lttia»been* 0 ""l 1 ' z '' !,c 1 

«« s . dammed Ho a «n* tern* ^ „ pH , «», 

. Glucoamylase 1 (EC 3.^.1 «» 
. Yeast hypothetical protein YBRZZS » . 

. Fissto nyeasthypotheticalprote,nSpAC30D 11 .0 ^ 

second region, which contain ^ c^rve ^ p „ the t, ve site "J^^ . x 

l1833 ] Consensus pa^mlGJHL^ 1 
50 consensus pattern G-[AV]-D-[LIVM I A] o 

MlHenHssataBfcc^ 

»^ 

[ 4] Hermans M.M.P., woos ivi.n., 



(1991) 

[1834] 778. Urease signatures 
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ions and the region of the active site 'hist.dine_ H -fLIVMl-H-x(3)-P [The two H's bind nickel] 

of a phoaphata group anachad id a V-osin. 'f ^LLt^T^ase hava baan cha,aeta,izad and can ba classi- 

currently known PTPases are listed below: 
[1841] Soluble PTPases. 

- PTPN1 (PTP-1B). 

and could act at junctions between the membrane and cytoskeleton. 
. PTPN5 (STEP). SH-PTP3' Syp), enzymes which contain two copies of the 

PTPN7 (LC-PTP; Hematopoietic protein-tyrosme phosphatase, HeP r). 
- PTPN8 (70Z-PEP). 
. PTPN9 (MEG2). 
. PTPN12(PTP-G1;PTP-P19). 

55 " JS PTP2 which may be involved in the ubiquitin-mediated protein degradatk* pathway. 

" Rss on Sypl and py P 2 which play a ro,e in inhibiting the onset of mrtos.3. 
Fission yeast pyp3 which contributes to the dephosphorylation of cdc2. 

! Y^TcDCu'Ich may be invoked in chromosome segregate. 
4 o - YersiniavirulenceplasmidPTPAses(geneyopH) 

. Autographs californica nuclear polyhedros IS v,rus 19 Kd PTPase. 

[1842] Dual specificity PTPases. 

u u , u 1- mkp IV which dephosphorylates MAP kinase on both Thr-183 
45 - DUSP1 (PTPN10; MAP kinase phosphatase-1 , MKP-1), whicn aepnosp 

and Tyr-185. u . . . a MAP kin o ses ERK1 and ERK2 on both Thr and Tyr 

. DUSP2 (PAC-1). a nuclear enzyme that dephosphorylates MAP kinases bH* 



residues. 
DUSP3 (VHR). 
DUSP4 (HVH2). 
DUSP5 (HVH3). 
DUSP6 (Pystl ; MKP-3). 



I SSfaPTTaS^dapboapbo^ea^PKinasaPUS, 

- Yeast YVH1 . 

- Vaccinia virus H1 PTPase 

[1843] Receptor PTPases. 



Yeast YVH1 

Vaccinia virus H1 PTPase; a dual specificity phosphatase. 
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«~n*«r PTPases are made up of a variable length extracellular domain, followed by 
[1844] Structurally, all known receptor " Pa ^^^L domain Some of the receptor PTPases contain fi- 
a transmembrane region and a C-term.na. catena doma^ Som annydrase , ike doma ins 

bronectin type III (FN-III) repeats, im ™ no 9 lobulin ^ of the PTPAse domain. The first 

in their extrace.lular ^^V^^XE* Terns to Sect substrate specificity of the first. ,n 

re^:,=;^ 

[1845] In the following table, the domain structure of known receptor PTPases ,s shown. 



Extracellular 



I Intracellular 



Leukocyte common antigen (LCA) (CD45) 

Leukocyte antigen related (LAR) 

Drosophila DLAR 

Drosophila DPTP 

PTP-alpha (LRP) 

PTP-beta 

PTP-gamma 

PTP-delta . 

PTP-epsilon 

PTP-kappa 

PTP-mu 

PTP-zeta . 



g FN-3 CAH MAM PTPase 



are not structurally related to the above PTPaseji. _ attern and to profiles. As profiles are much more 

^^?=rs»i^^==^=«- — =»— — -- — 

r 11 Fischer E H Charbonneau H.. Tonks N.K. Science 253:401-406(1 991 ). 
C ^ bonneau H., Tonks N.K. Annu. Rev. Cel. Biol. 8*63-493(1 992). 

3 Trowbridge I.S. J. Biol. Chem. ^3517-23520099^ 

4 Tonks N K., Charbonneau H. Trends Biochem. Sci. 14.497-500(1989). 
[5] Hunter T. Cell 58:1013-1016(1989). 

[1852] 780. Connexins signatures mBmb rane which consist of closely packed pairs of 

18 S3, Gap junctions [1] are specialized = 

transmembranechannels, the connexons through jch js often referred toas connexin. In a 
of connexins which are currently known are hsted below. 

t 

- Connexin 56 (Cx56). 

- Connexin 50 (Cx50) (lens fiber protein MP70). 

- Connexin 46 (Cx46) (alpha-3). 

- Connexin 45 (Cx45) (alpha-6). 
5 - Connexin 43 (Cx43) (alpha-1). 

- Connexin 40 (Cx40) (alpha-5). 

- Connexin 38 (Cx38) (alpha-2). 
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. Connexin 37 (Cx37) (alpha-4). 

. Connexin 33 (Cx33) (alpha-7). 

- Connexin 32 (Cx32) (beta-1 ). 

. Connexin 31.1 (Cx31.1) (beta-4). 

. connexin 31 (Cx31 ) (beta-3). 

- Connexin 30.3 (Cx30.3) (beta-5). 
. Connexin 26 (Cx26) (beta-2). 

below. 

** ** ** ** 

** ** ** ** Cytoplasmic 



«* »* ** ♦* Membrane 
** ** ** ** 

_** ** ** ** 

** *• ** ** Extracellular 



[,855, T, e seances o, «. ~ —la, ^ « 

bonds] u n a r^iinpr 1 A Paul D L Annu. Rev. Biochem. 65:475-502(1996). 

M8571 f 11 Goodenough D.A., Goliger j.a., i^aui u.i.. 

1858 781 . Gram-positive cocci surface proteins 'anchoring ^peptide ^ ^ 

Thfs Sructure is represented in the following schematic representation: 



| Variable length extracellular domain |H| Anchor |B| 



H': conserved hexapeptide. 
■B': cluster of basic residues. 
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essary for the proper anchoring of the prtf.ro wh,c | bear t to the cell wall. 
Proteins known to contain such hexapeptide are l.sted below. 

. Aggregation substance from streptococcus faecalis (asal ). 

- C5a peptidase from Streptococcus pyogenes (scpA). 

. c protein alpha-antigen from Streptococcus agalactiae (bca). 

- Cell surface antigen l/ll (PAC) from Streptococcus mutans. 
. Dextranase from Streptococcus downei (dex). 

- Fibronectin-binding protein from Staphylococcus aureus (fnbA). 
. Fimbrial subunits from Actinomyces naeslundu and viscosus. 

- IgA binding protein from Streptococcus pyogenes (arp4). 

- IqA binding protein (B antigen) from Streptococcus agalact.ae (bag). 
. IgG binding proteins from Streptococci and Staphylococcus aureus. 

- Intemalin A from Listeria monocytogenes (inIA). 

- M proteins from streptococci. 

- Muramidase-released protein from Streptococcus su.s (mrp)- 

- Nisin leader peptide processing protease from Lactococcus lacfs (n.sP). 

- Protein A from Staphylococcus aureus. 

. Trypsin-resistant surface T protein from streptococci. 

- Wall-associated protein from Streptococcus mutans (wapA). 

- wall-associated serine proteinases from Lactococcus lactis. 

[186 1] 0°^^t^ffS32?VA J Bacteriol. 172:3310-3317(1990). 
[1862] [ 1] Schneewind O., Jones K.F., Hscneui v.*. j. d*w 

[1863] 782. Gamma-gluto ^ n ^% S ^f^% GT) [1]ca , a , yz es the transfer of the gamma-glutamyl moiety 

[,865] Ths sequences ol ™™aten ■ ™> ^^.(^arboxybuanamidoKephalospcanic acid (GL- 

dornonascephak^ponnacylases (E03.5. 7 >" , ™ v ',;, 7 ar °\ cd af9 JSU™, related ,o GGT and also show 

1 1 M ,Ni«a M Biochinv Blophys. Acta tmaSMSW 
chelation ol aterrous Ion to protoporphyrin IX, lot onti ' P«*?™_ i„„e, membrane, whose active site laces 

ss r 11 Labbe-Bois R J Biol. Chem. 265:7278-7283(1990). 
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[2]. Enzymes known to contain such a domain are. 

- Endoqlucanase (gene endl) from Butyrivibriofibrisolvens. 

io . EndoglucanasesA(genecenA)andB(cenB)lromCellulomonasf.m,. 

- Exoglucanases A (gene cbhA) and B (cbhB) from Cellulomonas fim.. 

- Endoglucanase E-2 (gene celB) from Thermomonospora fusca. 

- Endoglucanase A (gene celA) from Microbispora bispora. 

. fndoglucanases A (gene celA), B (ce.B) and C (ceIC) from Pseudomonas foreseen, 
is - Endoglucanase A (gene celA) from Streptomyces lividans. 

- Exocellobiohydrolase (gene cex) from Cellulomonas fimi. 

Yulanases A (aene xynA) and B (xynB) from Pseudomonas fluorescens. 

SnTranSas^ (EC 3.2.S) (xylanase C) (gene xynC) from Pseudomonas fluorescens. 

- Chitinase 63 (EC 3.2. 1 . 1 4) from Streptomyces plicatus. 
20 - Chitinase C from Streptomyces lividans. 

2 s SEX ; rescues which could be invo,ved in the interaction of the CBD w,th polysacchandes. 



IcxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 
******** 

•C: conserved cysteine involved in a disulfide bond. position of the pattern. 

Consensus patternW-N-[STAGR]-[STDN]-[LIVM]-x(2)-[GST]-x-[GST]-x(2)-[LIVMFT H GA] 

J., Miller R.C. Jr. Eur. J. Biochem. 202:367-377(1991). 

K [thasTe^o^ 

u a ^ a ter>m ^ a bacterial olasmid-encoded enzyme that catalyzes the hydrolysis of in- 
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dimer , a by-product of nylon manufacture , [4] 
. Glutamyl-tRNA(Gln) amidotransferase subun _A[5l 
. Mammalian fatty acid amide hydrolase gene FAAH) [6]. 

^uT» 

[LIVMJ-R-x-P-lGSAC] 

M8821 786. Glycosyl hydrolases family 1 0 activ ^ sit f onc reni , irpsse veral types of enzymes such as endoglucanases 
SS ^mJrobia^ J 2 , Fungi an „ 

(EC 3.2.1.4), cellobiohydrolases (EC a2 ' 1 - 9 V r i2eL nd X y| an ases which, on the basis of sequence s.m.la nt.es 
produces a spectrum of cellulolyt.c cellulase family F [3] or as the glycosyl hydrolases 

. Aspergillus awamori xylanase A (xynA). 
. Bacillus sp. strain 1 25 xylanase (xynA). 
. Bacillus stearothermophilus xylanase 

. Butyrrvibrio fibrisorvens xylanases This pr ° tein C ° nSiStS ° f "° 

a xylanase. 

Cellulomonasfimi exoglucanase/xylanase (cex). 
Clostridium stercorarium ^«n^«^~- 
Clostridium thermocellum xylanases Y (xynY) and Z (xynz). 
Cryptococcus albidus xylanase. 
Penicillium chrysogenum xylanase (gene xylP ). 

Pseudomonas fluorescens xylanases A (^nA) an «J WJ^; jn consists 0 f three domains: a N-ter- 

Ruminococcus flavefaciens hydrolases; a central domain composed of 

srr 0 f^ 

hydrolases. 

- streptomyces lividans xylanase A (xlnA). 
. Thermoanaerobacter saccharolyticum endoxylanase A (xynA). 

- Thermoascusaurantiacus xylanase. 
. Thermophilic bacterium R18.B4 xylanase (xynA). 

, „ B. 9 uin P. Annu. Rev. »'X^G "2- R C Jr Wa,.sn RAJ. Microbiol. R.v. 5 5 :303-315(.991). 
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(1991). 



, [,836, W-F™""*^^ 

to zinc - for their activity. 

[1888] This family also includes the following proteins: 

. Escherichia co,i ga,actito. operon protein gatY which catalyzes the transformation of tagatose 

I18891 As signature patterns for this Cass of enzyrne^o = ed se = ^ = is 
ocated in the first half of the sequence and contains two ^™^™™£tote6 acidic residues and glycines. 

SnSutUemlLIVMJ-E* 

1 1] Perham R.N. Biochem. Soc. Trans. 18: 185-1 87(1 990)^ 

[ 41 Barn, A., Marsh.ll K.E FEBS L.tt 318:11-16(1993). 

[1891] 768. Prolyl f^^^^fJ^Z^^, „, avolntionary ralalad prolidases »hos. calalyric 
» (18921 The ^^^^y^^tZ, ro ma. ol ma uypsin lamlry C sa*. protaasas, bu, 

. Prc*^«asa<EC3.«, 2 6^ 

,arminal sir), ol lysyl and 1^ thal rem()VB « N4armi „al dlpaplldas ssquerr- 

« . Dip.plidylpspudaselV(EC3.4.l43UDPPW^ 

uration of the alpha-factor precursor. rlA oo^ 
. vaas, vacuo,a, dipaplidy, ^T^^Z^^ °gL* Th* anzyma caMyzas the hydro** 

with a free amino-terminus. 
,,893, Aconsa,vedsa,ina,es«u.ha S exper ta „^ 

contains about 700 to 800 ammo , acuteV 4) . G . X . S . X . G . G . [L , V MFYW](2) [S is the active site residue] 

55 f 11 Rawlinas N D Polgar L, Barrett A.J. Biochem. J. 279:907-911(1991). 

Stt 9 A S , Rawlings N.D. Biol. Chem. ^^^^ 
[ 3] Polgar L, Szabo E. Biol. Chem. Hoppe-Seyler 373:361-366(1992). 
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1 4] Ratings N.D.. Barrett A.J. Math. Enzyme.. 244:1 9-61 (1 994). 

M8961 789. Formate-tetrahydrofo.ate »^ s ° S ff_^^^ synthetase) (FTHFS) is one of the en- 

1897^ Formate-tetrahydrololate hgase (EC 6.3.4 3) o{ varjous bi0S ynthetic pathways. In many 

zymespa^ 

o S processes the transfers of one^arbon un,K a* between different oxidation states by 

emotes ^HFSact^sexpr^ 

section of FTHFS. rvG-A-A-G-G-G-Y 

[1901] 790. Transthyretin signatures hormone - b inding protein that seems to transport thyroxine fT4) 

belong to this family: 

. Escherichia coli hypothetical protein yedX. 
5 . Bacillus subtilis hypothetical protein yunM. 

. LenorhabditiseleganshypotheticaproteinROgHIO.S. 

. caenomabditiselegans hypothetical protein ZK697.8. 

-ru i^atPri in the N-terminal extremity starts with a 

W -regionswerese, = 
,o lysine known to be invo J^^^^^ >< binds thyroxine] 

[1905] Consensus pa« e ™ WV] L UN J ,^ ' . [FYW] . [GS ]-[FY]-[QS] 

1907^ 791 . Dihydropteroate synthase signatures metabolites. Most microorgan- 

45 23 All organisms require reduced folate c«*^^^S^?SS^ system of higher vertebrate ceHs wh.h 
!sms must synthesize folate de novo , because the ^ ^ ^ ^ of folates are therefore 

[19091 Dihydropteroate synthase (EC 2.5.1.15) (D ^)^l_ d ^ dropXeroaX B. This is the second step in the three 
[1912] Consensus pattem[LIVM]-xlAt.Jli-iv iv 1V ; 
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Oo„ se nsu S pa»e ml GEHSAW U V U ,( 2 )-D- l L l V M ,-G l OPH( 2 >lSTAHP 

flowing forms of PI4-kinases are known: 

. Human PI4-kinase alpha. 
, 5 . Yeast PI K1 , a nuclear protein of 1 20 KO. 
Yeast STT4 a protein of 214 Kd. 

. Fission yeast hypothet.cal protein S P AC22E1 a t,o 
(1994). 



TSo R., Ogawa H. Nucleic Acids Re, 22:3104-3112(1994). 



n 9 l91 793 FAD-dependentg.ycero.-3-phosphate *hydmgena» ^^^p^ catalyzes th e conversion of gryc- 



EP 1 033 405 A2 

, the bacterial or yeast proteins by having an EF-hand calcium- 

M9231 Consensus pattemllV]-Q-^ ^ *W ^ 
KnsuspatternG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 

x i i Rarteriol 173:101-107(1991). 

Yeast nucleo ar protein NOP^ ipr t in» , 

. Yeasi hypoMlical prolan VBL024W 
„ . BacMtialprolw^nlalsoknownasW* 

rela ,. aTh .,p ro ,e,„ 8 a re : 

Bacterial and archebacterial protein moaA, wn.cn 

- : SsSS^^r 

.S^^— — — — — 
. S========^" 
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SPK1/SAD1 [3]. The latter * he onty known prot ^ ^ ^ bQ tnQ ortholog 0 f SPK1 . 
. protein kinase cdsl from fission yeast contain recornbina tion. 

,5 the ORF of the kinase. 

[ 3] Navas T.A., Zhou Z., tneage 

25 270.1170-1176. 

Number of members: 54 
[1935] 798. Glyco_hydro_38 

Number of members: 20 «<««>■ or- 1S3-1 56 

_ Mpdljn , 98313 424- Oxidase families' Biochem Soc Trans 1998,26.153 156. 
35 [1937] [1]HenrissatB; Medline. 983134^, y 
[1938] 799. HECT 

40 Number of members: 43 q „ 9 oq 81 • A family of proteins struc- 

- =~Z 

In the HRDC domain cause human disease. 

ss composed of three doma n s^The am*ot ^ 
domain [1].The carboxyl terminal domain 
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Number of members: 581 
[1946] 



sssssssssssssss ssssssss 



[1947] 802. Iig_chan 



tors. 

Number of members: 128 



Number ot memoers. i« 

cineurin" Science 1995;267:1510-1512. 
20 [1950] 803. RhoGAP 

RhoGAP domain Rhn/Rac/Cdc42-like small GTPases. 

[1951] GTPase activator proteins towards Rho/Rac/uac^ 



Number of members: 97 
[1952] 



ass: s»swriKi«* 



40 T1953] 804. vwd 

SJeLue 9 r°»* FEBS '« «93;3 2 7 ; 125-130. 

45 Number of members: 92 

[19551 805. zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

55 Number of members: 51 

[1956] 806.AIRC 
AIR carboxylase 
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this domain. Number of members: 35 

genes involved in pattern formation, such as j"*- tjvation of multiple homeotic genes. 

activators, such as GCN4, HAP2/3/4 or ADA2. 

. Caenorhabditis elegans protein cbp-1 . 

- Yeast hypothetical protein YGR056W. 

,o - Yeast hypothetical protein YKR008W. 

. Yeast hypothetical protein L9638. 1 . 

40 sertion. , nc » r PQ ion with significant similarity to the C-terminal half of the bromodo- 

(6.8)-Y-x(1 2,1 3)-[LIVM]-x(2)-N-[SACFl-x(2)-[FY] 



[1962] 



(1992). 
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r3 1 , SS.O U ,Op,.3 0 o«D 8 ..«7«77 ( , M6 ,. 
M964] Alpha-aclinin is a F-actin cross-linking "J* h the first 250 residues of the protenv A 

the cytoskeleton to the plasma membrane. 
[L IVMHUVMT1-W-X- [LIVM](2) 

: ifiiiilli 

M970] Enzymesthatbelongtothisfam.lyare: 
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erodimer of norC and the catalytic subunit norB. The latter contains 
membrane segments [5]. 
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the 6 invariant histidine residues and 1 2 trans- 



SS2 X£SZ£$XZ^^ 



[1972] Consensus pattern[YWG]- 
[1973] Notecytochrome bd compiexes do not belong to this family. 

Sstresana ,, Luebben M., Saraste M.. Higgins D.G. EMBO , 1 3:2516-2525(1994). 

Capaldi R.A., Malatesta F., Darley-Usmar V.M. 
Biochim. Biophys. Acta 726:1 35-148(1983). 
[4] 

Holm L, Saraste M., Wikstrom M. 
EMBO J. 6:2819-2823(1987). 

[5] 

Saraste M., Castresana J. 
FEBSLett. 341:1-4(1994). 

[197 „ 3,0. *W» Eukaiyotic ***** — — *— ™™ « 

izr^toi stress*- — » — — * — • AB * ya8 ° xMass ,s wswy 

. Nilra.a reductas. (EC 1.6.6 1), which <*•*»•" >«£«™ ™ b5 - W . h.ma-binding domain 
900 amir» acids consists of an N-t.rminal "J^^™^^ £, uctase domain. 
(s es <F>DOC00, 70» and a Carmina ^^^SSTu, s „„a... Slruaarally, this .nzyme o. about 

« wLon AC Nicolson R.E, Cook AM., Walta.s D.E.. Butke J.F.. Dovl. 
WA BrayR.C. 

Biochim. Biophys. Acta 1057:157-185(1991). 

811.(DNAJigase)ATP-de P e^ PS00333; DNA_LIGASE_A2 

(EC 6.5.1 .2). Qno nN|A ., eB are ATP-dependent. During the first step of the 

residue is the site of adenylate [1 ,2]. conserved reg ion common to all ATP-dependent DNA ligases is 
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EST Signature patterns were developed «or both f^^J. active site residue] 
982 Consensus pattem[EDQHhx-K-x-DN^ 

LindahlT., Barnes D.E. 

7L. Rev. Btochem. 61:251-2810992). 

131 

Sfc Acids R<*. 20:5389-5396(1992). 

, 19M1 a,2 ( ™ «*p ^^"^ 

L,«kU PS0097? ; FAD:g3PDH_1 , PS00978; ™_G3PDH 2 , he conversio „ ,* 9 lyc- 

ri9881 Consensus pattern[IVJ-Ca-ij u x^; » i 
o Consensus P atternG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 

Austin D., Larson T.J. 

j. Bacteriol. 173:101-107(1991). 

* Roennow B., Kielland-Brandt M.C. 

Yeast 9:1121-1130(1993). 

Swn L.J., McDonald M.J., Lehn DA, Moran S.M. 
J Biol. Chem. 269:14363-14366(1994). 

[1989] B13 (Pap^vco) — «™ -~'' 1,1B * 

^^rnidopvri^ 
4S inzyme involved in DNA repair and wh,c ^^S-^SSS™.Wu«. In addition to its grycosylas, , adn^ 
formamidopyrimidine (Fapy) and s tes) . FPG is a monomeric protein of about 32 Kd which 

PPG can also nick DNA at apurimc/apynm.dinic sites (AH > 

[The four C's are putative zinc ligands] 

Duwat P., de OIK/eira R., Ehrlich S.D., Boiteux S. 
Microbiology 141:411-417(1995). 

Connor I.E.. Craves R.J., Demurcia G., Castaing B., Laval J. 
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j Biol. Chem. 268:9063-9070(1993). 

„t,rtacfl sianature PROSITE cross-reference(s): PS00462, 
M993] 814 (G glu transpept) Gamma-glutamyltranspeptidase s,gnature PH 

G_GLU_TRANSPEPTIDASE zes the transfer ol the gamma-glutamyl I moiety 

[1994] Gamma-glutamyltranspeptidase & G J**f^£h or * ater ({ofTTling glutamate). GGT plays a key role 

7ACA)into7^irKxephato6poran.cacri (7A^ composed of two subunits. 



[1] 

Tate S.S., Meister A. 

Meth. Enzymol. 113:400-419(1985). 

Suzuki H., Kumagai H., EchigoT., TochikuraT. 

J. Bacterid. 171:5169-5172(1989). 

[3] 

Ishiye M., Niwa M. 

Biochim.Biophys. Acta 1132:233-239(1992). 



nQQfn 815 G-protein gamma subunit profile 

PROSITE cross-reference(s): PS50058; ^"^^Pl^J^jJ^wJj^t as intermediaries in the transduction of ^ na ' S ^"." 
[1999] Guanine nucleotide-b.nd.ng proteins (G prate ns) p a ^ g) Jhe a , pha subun rt 

Ledbytransmembranerecep^ 

least 1 2 different isolorms of gamma G-protein signalling, contains a G-protem 

[2001] The Caenorhabditis elegans protein egl-10, which reg 

JE^pSSL developed that spans the comp.ete length of the gamma subun, 

[1] 

Pennington S.R. 

Protein Prof. 2:16-315(1995). 

r2003] 816. GNS1/SUR4 family signature 

established, are evolutionary related [1]. 

. Yeast hypothetical protein YJL1 96c 

. caenorhabditis elegans hypothetical protein C40H1 .4. 

. caenorhabditis elegans hypothetical protein D2024.3. 

Sr^nem. This region is located in « hycWpMc loop. 
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Consensus patte rn L-x- F- L- H -x-Y-H -H 
[1] 

Bairoch A. 

Unpublished observations (1996). 

El-Sherbeini M., Clemas J.A. 

J Bacterid. 177:3227-3234(1995). 



Garcia-Arranz M„ Maldonado A.M., Mazon M.J., Portillo F. 
j Biol Chem. 269:18076-18082(1994). 

inksdoydisuHld. bonds ^rear S «l^«J^™ d>(a P silon , ^ a „ a ™, all 

(CH1 to CH4). molecules are made of two chains. In class I [2] the alpha chain 

20081 The major histocompatibility complex (MHC) molecules are maae _ bet£J chain (beta . 

Lcomposed ot three extrace.lular ^^^^m. oth the a! I ha and the beta chains are 
2-microglobulin) is composed of a single ext " c ^ plasmic tail. 

composed of two extracellular domains, a domain in each type of MHC 
[2 009] It is known [4,5] that the lg constant cha n d mains and a «nQ acjds and jnclude a con . 

heavy chains type Alpha C region^ A.Un CH2 ^ 9^*J type Gamma C region : All, in CH3 and also 
chains type Epsi.on C region: All, ,n CHI , CH3 CH3 J d CH4 . lg light chains type Kappa C region : 

chains : . • tho rvtomea alovirus MHC-1 homologous protein [6]. Beta-2-microglobulin : All. 

[1] 

Gough N. 

Trends Biochem. Sci. 6:203-205(1 981). 
o [2] 

Klein J., FigueroaF. 

Immunol. Today 7:41 -44(1 986). 

[3] 

Figueroa R, Klein J. 
,5 Immunol. Today 7:78-81 (1986). 

Or! H.T., Lancet D., Robb R. J., Lopez de Castro J.A., Strominger J.L. 

Nature 282-266-270(1979). 

[5] 

so Cushley W., Owen M.J. 

Immunol. Today 4:88-92(1983). 

[6] 

BeckS., Barrel B.G. 
Nature 331:269-272(1988). 

K poi,] 8,8 (IGFBP) insulin*. 9 -0* „c,0, binding proteins s^natur. PROSITE c,oss, S .0,.ne3(s>: PS00 222 ; 
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hMU or sWulate the gtowth promot.ng elee u £ the IGFe™ « ^ MM 

factor binding proteins [4,5]: 

. Vertebrate protein NOV. 

[1] 

Rechler M.M. 

Vitam. Horm. 47:1-114(1993). 
[2] 

' pSj^SSflh' Ficto^Res. 3:243-266(1 991 ). 

?I:rEn S docL..Metab. 1:412-417(1990). 

Oldham D.M.. Igarashi A., Potter R.L, Grotendorst G.R. 
j. Cell Biol. 114:1285-1294(1991). 

La.oise, V, Martinerie C, Dambrine G., P.assiart G., Brisac M., Crochet 
to J., Perbal B. 

M0l. Cell. Biol. 12:10-21(1992). 

poiq 819. LMWPc: Low molecular weight phosphotyrosine protein phosphatase 
Snq' 820. (myosin_head) ATP/GTP-binding s,te rnotrf A (P-loop) 

PR08ITE cros S -reference(s): ^ T ^J£™*te data anarysis R has been shown [1,2,3,4,5,6] that an ap- 
Sene^ete.r.dteasth^ 

. ATP synthase a<=ha and beta subanil. (see <PDOC00137». 

. Dynamins and dynamin-like proteins (see <PDOC00362». 

. Guanylate kinase (see <PDOC00670>). 

. Thymidine kinase (see <PDOC00524>). 

- Thymidylate kinase (see <PDOC01 034>). 

55 - Shikimate kinase (see <PDOC00868>). 

: ^^^X^^^^*^™ 

. DNA and RNA helicases [8,9,10]. 
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[2021] 

because 



Has tan* ol OTP-binding proteins (Has. Rno, n» " 

EeTpnotein » t« rfTO f^ 0 C<»78,,) 
ADP-ribosylation .actors lamily (see <PDOC0078l>). 

Bacterial recA protein (see <P D0 5°£?!'+ 

Bacterial recF protein (see < PD < X ™"!' l . ul) „ nte , GI , as, St. GO, etc.). 
Oaanin. nuoieotide-bindrng gum. ^^Sk»0IM88»). 

[2M2] Consensus patternlAG]-x(4)-G-K-lST] 

Walker J.E., Saraste M„ Runswick M.J., Gay N.J. 
EMBO J. 1:945-951(1982). 

l 2] 

MollerW., Amons R. 
FEBS Lett. 186:1-7(1985). 

Saraste M.. Sibbald PR., Wittingholer A. 
?fends Biochem. Sci. 15:430-434(1990). 
[6] 

5 Sol" Biol. 229:1165-1174(1993). 

logins CF., Hyde B.C., Mimmack M.M., Gileadi U., G, D.R, GaHagher MP. 
xSenerg Biomembr. 22:571-592(1990). 

Erm 2 C 2-23(1988) and Nature 333:578-578(1988) (Errata). 
ISder P., Lasko P., Ashburner M.. Leroy P. Nielsen P.J., Nishi K., 
45 SchnierJ., Slonimski P.P. 

Nature 337:121-122(1989). 

SoLenya A.E., Kconin E.V.. Donchenko A.P., Blinov V.M. 
Nucleic Acids Res. 17.4713-4730(1989). 
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^ t r^nnnr Ft Davies R, Devlin K, Feltwell T, Gentles S, 

822 RNB) Ribonuclease II family s-gnature 
1 , ll , EC31l3 i)(RNasell)(genernb)[1].RNase.lisanexo- 

i_m tr> 1250 (SSD1) While their sequence is highly 

[FY]-x-D-x(3)-[HQl 

Zilhao R., Camelo L, Arraiano CM. 
30 Mol Microbiol. 8:43-51(1993). 

^ uchiE.- Hayashi N, Azuma V., «T = ^ 
Yanagida M., He X., Mueller U., Sazer S., N.sh,moto T. 
EMBO J. 15.5595-5605(1996). 

35 Beuf L, Bedu S., Cami B„ Joset F. 

Plant Mol. Biol. 27:779-788(1995). 
[4] 

S'fAcids Res. 25.3187-3195(1997). 
120291 823 Src homology 2 (SH2) domain profile 

PROSITE cross-reference(s): PS50001; SH2 amino-acid residues first identified as a 

S e Src homology 2 (SH2) domain is a pro e n doma.n o abouHO^ es were later found in many 

45 onseled sequence region between .^ 

strictry phosphorylation-dependent manner [3^ heijces ^ six to seven beta-strands. 

[2 031] The SH2 domain ° f ,W ° ^ ^ ^ 

" the Src Abl Bkt Csk and ZAP70 families of k.nases. Qf th0 SH2 domain are 

found in those prote.ns n between tne ca » bupjt 
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Mammalian Ras ^^f^tZ^o^ exchange factors to growlh factor receptors: vertebrate 



Chicken tensin. 
Yeast transcriptional control prote,n SPT6. 

• uoeoH nn a structural a gnment consisting oi o y^K 



Sadowskil.,StoneJ.C.,PawsonT. 
Mol. Cell. Biol. 6:4396-4408(1986). 

Russel R.B.. Breed J., Barton G.J. 
FEBS Lett. 304:15-20(1992). 

MarangereLE.M.PawsonT. 
TcellSciSuppl. 18:97-104(1994). 
[4] 

PawsonT.SchlessmgerJ. 
Curr. Biol. 3:434-442(1993). 

Mayer B.J., Baltimore D. 
Trends Cell. Biol. 3:8-13(1993). 
[6] 

Pawson T. 

Nature 373:573-580(1995). 



dysplasia (DTD). h , me st yiosanthes hamata. 

. sulfate transporters 1,2 and 3 from xne 1B y 

. Human protein DRA (Down-Regulated in Adenoma, 
> . soybean early nodulm 70. 
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Sandal N.N., Marcker K.A. 

Trends Biochem. Sci. 19:19-19(1994). 

g*. FW, Hawkesford I/LA. Prosser I.M., Clarkson D.T. 
Mo. Gen. Genet. 247709-715(1995). 

Palmer KJ, Tichelaar W, Myers in, du 

of members: 37 
References: 

120411 , „, ,uculose-1-phosphatealdolase1romEscherichiacoli. 

30 [2042] 827. CBD_2 

■l- Cellulose binding domain found in oacie. 
35 References: r P iiulomonas fimi by nuclear 

magnetic resonance s P e ™ 0SCO Pf_ ^95-34 6993-7009. 

Number of members. 91 
45 References. 

I 20451 c=inn nroteases is required for intramo- 

r20 461 829 Uncharacterized protein family UPF0020 signature 
55 P^S.TEcross-^ 

The following uncharacterized proteins 

, . ■„ urhY and HI0116/15, the corresponding Haemopniius 
. Escherichia coli hypothetical protein ycbY and Hiono 
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. Bacillus subtilis ^^JSSSI^ ^ slrt)064 - 

■ Methan0C0CC can be p jck ed up in the database by the 

[20 47] These are hydrophi.ic proteins o. from 40 Kd to about 80 Kd. They 



lollowing pattern. [L , V MF]-C-G-[ST]-G-x(3)-[Lll-E 
[2048] Consensus patternD-P-lLiVMr]w i 

References. 

. Yeast chromosome XI hypothetical proteir jYKL15la 
. Caenorhabditis elegans hypothetical protem R107.2. 
. Escherichia coli hypothetical protem yjeR 
. Bacillus subtilis hypothetical prot e<n J*D_ 

' SnS«c>.= ian"^ wh ) , l >0,hetica ' p,<, ' e ' „ 

" .a.^,.-*-----"*"— ^ — 

[2051] These are proteins ol about ju iu 

, he database by the lollowing , pattern^ Q . lpNS] ^. L .[ G p 1 -x-[DENQT] 



35 Number of members: 39 



Number of members: 39 9819 2624 Molecular 



mass in pumyw". ^ -~ 

r20561 832 (AlCARFTJMPCHas) 

,;h„„,i^lfsntide^ tO IMP [11. 



ribonucleotide) to IMP [1]. 
Number of members: 22 
[2058] 
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[20S9] 833. (AOX) 

by low temperature [1]. 
Number of members: 27 

lsssa==s=ssssfflssri— 
Ksssasassssasassr---- 

o [2062] 834. (APH) 

P,o«* kinases f *T f^J * . ppoTEIN.KINASE.ATP, P500108; 

HoLrvad catalytic core common to ^JT^^Z^J^^^-^^Z^, 
ara) covers the entire catalytic *rnain. MGSTNHH SGAWPWHUVCAT H POH- [GSTACUVMFYl-x 

« *S Sequences Kno»n to belong to ^^'^ E ^-Ba n *^^D,C^^O^ 

< S Lli-osina specific protein kinases w» '^^'^andTlesvirusas ganciclovir kinases [10); v*.ch arapn> 
mosl oaceria, aminoglycoside <*^^^^^^L M V^<o^^f«'<^^ 

scloseto100%. Noteeul«ryotic-typeprote,ntan^ 

LthusnilandYersinia P seud 0 tubercu,os^ and a profj|e . As th profile . much more 
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[2071] 

r i] Hanks S.K.. Hunter T., FASEB J 9:576-596(1995). 

2 Hunter T., Meth. Enzymol. 

3 Hanks S K , Quinn A.M., Meth. Enzymol. 200.38-62( 991). 
8 Benner S., Nature 329:21-21(1987). 

EsEaSSssst. 

[2072] 835. (Asp_Glu_race) 

enzymes that do not seem to require a cefaclor ^ ^ r and some peptide-based antib,ot,cs such as 

mate into D-glutamate, is required for the b,osyn ^^Jjj^^., this family also includes a hypothetical 

[2077] 836. (ATP-sulfurylase) 
* Jm polypspMa chain associated »'^"~aW s^s .ram inorganic sulphate [2]. ATP sulfas. 

Number of members: 37 
[2079] 

M, M. K. Wannan ML, K.ishnan S. C— M, ^WJISS^'SS 

both ATP sulfurylase and APS kinase actrvrties. Gene 1995,165.243 24« 
[2080] 837. (ATP-synt_F) 

['? The family also includes archaebacterial ATP synthase subun-t F [3]. 



Number of members: 23 
[2082] 
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a 14-kDaF-subunit of the vacuolar ATP^ 

[2] Peng SB, Cnder BP, Tsa, SJ Xie XS Stone , D£ ™ . chem , 996;271 ;3 324-3327. 

with the catalytic sector of clathnn-coated vesicle H+ Ai subunjt an£j Qrgan . 

18843-18852. 

[2083] 838. (CBD_4) 
Starch binding domain 

Number of members: 48 

[2084] 839. (CbiX) co balamin biosynthesis operons and so may have a 

might be involved in metal chelation [1]. 
Number of members: 6 

840. (Complex 1_51K) 

[2M7) NAOH a***-. « Kd M «*— M P»«* 

C0MPLEX1_51K_1 PS00645; C0MPLEX1_51K_2 |ex , or NA DH-ubiquinone 

[2088] Respirator-chain NADH ^^^^^^rmLK^ membrane which also seems 
oxidoreductase) is an oligomer* enzymatic xomp ex b«M^ ^ oxidoreductase ). Among the 25 to 30 

to exist in the chloroplast and ,n oyanobactena ^^ ^^7^™^^ a molecu i a r weight of 51 Kd (in mammals), 

KITSCH 

seems to bind to NAD, FMN, and a 2Fe-2S cluster. 
[2089] The 51 Kd subunit is highly similar to [3,4]. 

. Foi Escherichia eeli NADH-ebiauinone oxitoedectase (go. nuoF). 

. M The5,Kde U b m .^c-Ha,h^ 

[2092] 841 . (DAP_epimerase) 
Diaminopimelateepimerase signature 

the first of these two active site cysteines were selected. 
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[2096] 842. (DNA_gyraseB_C) 

DNA topoisomerase II signature ^ ortie ™, COA e P „ 

[2097] Cross-reference(s) PS00177; ^SOMERASO ^ ^ catalyze ^ io n 

DNA topoisomerase I (EC 5.99.1 .2) [1 ,2 3,4,Ei J is one ^ot i " ' . DNA segment through 

of topological DNA isomers. Type II to P olsomera *f ^ prokaryotes, eukaryotes, and 

a transient double-strand break. Topoisomerase II > *^^^^Ib o three subunits (the product of 

sC—logy between the different subtypes of topoisomerase II. The 
SSSL be^een the di? subunits is shown in the following represents. 



-About-1400-residues-- 



[ Protein 39-*--][--Protein 52—-] Phage T4 

( gyrB— ][- gy fA 1 Prokaryotell 

Archaebacteria 

[ parE •— ][- parD 1 ProkaryotelV 

* ] Eukaryote and 

ASF 

Position of the pattern. 

[2100] Consensus pattern: [LIVMA]-x-E-G-[DN]-S-A-x-[STAG] 

[ 1] Sternglanz R, Curr. Opin. Cell Biol. 1*3^(19»>- 

2 Bjornsti M,A, Curr. Opin. Struct. Biol. 1.99- 03 1 91 ). 

3 Sharma A., Mondragon A., Curr. Opin. Struct Etol. 5.39-47(1995). 
[4] Roca J., Trends Biochem. Sci. 20:156-160(1995). 

[2101] 843. (DUF16) 

Protein of unknown function Q „„ aa ,„ in n niw occur in Mycoplasma pneumoniae. 

[2102] The function of this protein is unknown. It appears to only occur wiycop 

Number of members: 26 
[2104] 844. (DUF21) 

[2105] Domain of unknown function sequences in this family are annotated as 

SSSasss 
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Number of members: 42 
[2107] 845. (DUF56) 

noweverthefamily includes Sec59 f rom yeast. Sec59 is a dolichol kinase EC:2.7. 1.1 08, but rt. snot clear rt the enzymatic 
activity resides in this region or its N terminal region. 
Number of members: 13 
[2110] 846. (DUF94) 

r2111] Domain of unknown function . 
2112 The function of this domain is unknown. It is found in both eukaryotes and archaebac tena. The alignment 
contansacompletety conserved aspartate residue that rr*y be functional* important 

three conserved cysteines and a histidine that might be metal binding, however these are absent .n the archaebactenal 
proteins. 

Number of members: 9 

[2113] 847. (FF) 
[2114] FF domain 

[2115] This domain may be involved in protein-protein interaction [1]. 
Number of members: 42 

[2116] [1] Bedford MT, Leder P; Medline: 99322199 The FF domain: a novel motif that often accompanies WW 
domains." Trends Biochem Sci 1999;24:264-265. 
[2117] 848. (FLOJ.FY) 

pTl^'ThSily^sistsof various plant development proteins which are homologues of floricaula (FLO) and Leafy 
(LFY) proteins which are floral meristem identity proteins. Mutations in the sequences of these prote.ns affect flower 
and leaf development. 

Number of members: 16 

[2119] 

[1] Hofer J, Turner L, Hellens R, Ambrose M, Matthews P, Michael A, Ellis N; Medline: 97411151 UNIFOLIATA 
regulates leaf and flower morphogenesis in pea." Curr Biol 1997;7:581-587. „,„ ACV . , „ almar 

[2] | Weigel D, Alvarez J, Smyth DR, Yanofsky MF, Meyerowitz EM; Medline: 92274452 LEAFY controls floral mer- 
istem identity in Arabidopsis." Cell 1992;69:843-859. 

[2120] 849. (G-patch) 

£S!f ThTstomain is found in a number of RNA binding proteins, and is also found in proteins that contain ^RNA 
binding domains. This suggests that this domain may have an RNA binding function. This doma,n has seven h.ghly 
conserved glycines. 

Number of members: 47 

r2122] [1]AravindL Koonin EV; Medline: 10470032 G-patch: a new conserved domain in eukaryotic RNA-processing 
proteins and type D retroviral polyproteins." Trends Biochem Sci 1999;24:342-344. 
[2123] 850. (Gram-ve_porins) 
General diffusion Gram-negative porins signature 
; [2124] Cross-reference(s) PS00576; G RAM_NEG_PORI N 

The outer membrane of Gram-negative bacteria acts as a molecu.ar filter for hydrophibc f"*^™"*^ 
as porins [1 ], are responsible for the 'molecular sieve" properties of the outer membrane. Porins form large water- f,Hed 
channels which allows the diffusion of hydrophilic molecules into the periplasms space. Some porms form general 
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diffusion channels that allows any solutes up to a certain size (that size is known as the exclusion limit) to cross the 
S^lleX porins are specific for a solute and contain a binding site for that solute ,ns,de the pores (these 
ET^ri2!£to.>. As porins are the major outer membrane proteins, they also serve as receptor srtes 
for the ^bldtng o phages and bacteriocins. General diffusion porins general^ assemble as turner ,n the membrane 
ZTL ^lane core of these proteins is composed exclusively of beta strands [2]. It has been shown [3] that 
a number of general porins are evolutionary related, these porins are: 

Enterobacteria phoE. 

- Enterobacteria ompC. 

- Enterobacteria ompF. 
Enterobacteria nmpC. 

- Bacteriophage PA-2 LC. 

- Neisseria PI. A. 

- Neisseria Pl.B. 

[2125] As a signature pattern a conserved region was selected, located in the C-terminal part of these proteins, 
which spans two putative transmembrane beta strands. .,.„,.„ », 

0126] Consensus pattern: [ LIVMFY]-x(2)^-x(2)-Y-x-F-x-K-x(2)-[SN]-[STAV]-[UV M FYW].V 

[1] Benz R., Bauer K., Eur. J. Biochem. 176:1-19(1988). 

[21 Jap B.K., Walian P.J., Q. Rev. Biophys. 23:367-403(1990). 

[3] Jeanteur D., Lakey J.H:, Pattus F., Mol. Microbiol. 5:2153-2164(1991). 

[2127] 851.(HlyD) 

HlyD family secretion proteins signature 

r 2 i281 Cross-reference(s) PS00543; HLYD_FAMILY , . 

Gram-negative bacteria produce a number of proteins which are secreted into the growth medium by a mechan sm 
fhat doe not require a cleaved N-terminal signal sequence. These proteins, while having different tunc ions, require 
tne hi of two or more proteins for their secretion across the cell envelope. Amongst which a protein belonging to the 
ABC Uansportere Sy (see the relevant entry <PDOC00185» and a protein belonging to a family wh,ch ,s currently 
composed [1 to 5] of the following members: 



Gene 


Species 


Protein which is exported 


hlyD 


Escherichia coli 


Hemolysin 


appD 


A.pleuropneumoniae 


Hemolysin 


IcnD 


Lactococcus lactis 


Lactococcin A 


IktD 


Aactinomycetemcomitans 


Pasteurella haemolytica Leukotoxin 


rtxD 
cyaD 


Apleuropneumoniae 
Bordetella pertussis 


Toxin-Ill 

Calmodulin-sensitive adenylate cyclase-hemolysin(cyclolysin 


cvaA 


Escherichia coli 


Colicin V 


prtE 


Erwinia chrysanthemi 


Extracellular proteases B and C 


aprE 


Pseudomonas aeruginosa 


Alkaline protease 


emrA 


Escherichia coli 


Drugs and toxins 


yjcR 


Escherichia coli 


Unknown 



These proteins are evolutionary related and consist of from 390 to 480 amino acid residues. They seem to be anchored 
in the membrane by a N-terminal transmembrane region. Their exact role in the secretion process .s not yet 
Icnown The ciinal section of these proteins is the best consen/ed region; a signature pattern from that reg.on was 
?2 e i29] d Consensus pattern: [ L.VM ] .x(2)-G- [ LM]-x(3)-lSTGA^-x- [ L,VMT ] -x- [ UVMT l - [ GE]- X -[KR]-x- [ LIVMFYW](2)-x- 
Seq^n^known to belong to this class detected by the pattern ALL, except for emrA and yjcR. 
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[5] Lewis K„ Trends Biochem. Sci. 19:119-123(1994). 



[2131] 852. (IBR) 

In Between Ring fingers between pairs of ring fingers #-C3H<** ™» 



finger linked) domain [2] 
Number of members: 25 
[2133] 



„,M»- — — — — 



[2134] 853. (IPPT) 
IPP transferase 



. ~ ^ ra r- Midline- 97440126 The modified nucleoside 
[2135] 854. (KE2) 

KE2 family protein altho n tney have been suggested to contain a DNA 

40 [2136] The function of members of this tamiiy is unw 
binding leucine zipper motif [2]. 

Number of members: 9 
45 [2137] 

H-2K lesion - Genu 1991,107:345-346. 35,09859 YKE2. a yeast nuclear gene encoding a protein 

H Shang HS Wong ^™J^,S*hU »«*■ <-» 
so showing homology to mouse KE2 and containing H 

r21381 855. (Lipoprotein_6) 

Prokaryotic membrane lipoprotein lipid at tachmen t ^ 

[21 39] Cross-reference(s) PS0001 3; ^^^^^^^ signal peptide, which is cleaved by a specific 
55 tnprokaryotes, membrane lipoproteins are sy ^ s f d ^ 

processing currently include (for recent listings see [1 ,2,3]): 
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. Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 

. Escherichia coli lipoprotein-28 (gene nlpA) 

. Escherichia coli lipoprotein-34 (gene nlpB). 

- Escherichia coli lipoprotein nlpC. 

Escherichia coli osmotically inducible lipoprotein E (gene osmE . 
■. |"Sa coli peptidogwcan-associated^ 
. Escherichia coli rare lipoproteins A and B (genes rp IA 
. - Escherichia coli copper homeostasis prote.ncutF (or nlpE). 
. Escherichia coli plasmids traT proteins. 
. Escherichia coli Col plasmids lysis proteins. 

. Chlamydia trachomatis outer membrane protein 3 (gene om P 3). 

. Fibrobacter succinogenes endoglucanase cel-3. 

- Haemophilus influenzae proteins Pal and Pep. 

20 - Klebsiella pullulunase (gene pulA). 

. Klebsiella pullulunase secretion protein puis. 

. Neisseria outer membrane protein H.8. 
25 - Pseudomonas aeruginosa lipopeptide (gene IppL). 

. Rickettsia 17 Kd antigen. ■ Ianr , rm \ijt 

go : 

- TreDonema pallidium 34 Kd antigen. 

. Treponema pallidium membrane protein A (gene tmpA). 

- Vibrio harveyi chitobiase (gene chb). 

References 
[2143] 

50 n, Hayashi S.. Wu H.C., J. Bioenerg. Biomembr. 22:45- .-471, |1990, 

2 Klein P., Somorjai R.L. Lau P.C.K., Protein Eng. 2.15-20(1988). 

[4] ^tt^sT^^- KentS.B^ R^dil^W K., Oesterhelt D., Engelhard M. J. Biol. Chem. 269:14939-14945 

55 (1994). 

[2144] 856. (Lipoprotein_7) 
Adhesin lipoprotein 
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Number of members: 18 
4027-4034. 

region of any of these proteins. The maoC gene pan 
monoamine oxidase [1]. 

Number of members: 46 mon oamine-reg U lated 
teriol 1992;174:2485-2492. 

[21511 This family cons.sts of the MKWPjo iy manganese -stabilizing protein as rl is associaiea 

45 the variable shell." Genome Res 1999,9:608-628. 
Number of members: 27 
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Number of members: 23 
[2157] 



n onprotein component required for P^^.^^^^Nucteo.ar KKE/D repeat proteins Nop56p and Nop58p 
1164-1170. 

Number of members: 36 

[2160] 862. (NTPJransf_2) 

fuperfamily Trends Biochem Sci 1995;20:345-347. 
so T21621 863. (Paramyxo_P) . 
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Number of members: 15 
[2164] 



Med line' 99329169 Dissection of Individual Functions of the Sendai 



Number of members: 21 
[2167] 



" « E; M~. So— «— P— a 

121681 865. (Pentapeptide_2) D DCf a mih# 

30 

Number of members: 362 
T2171] 866. (Peptidase_Cl3) 

. SSs==a=s=sss=a=ss sea*. 

Membe^of this fami.y are asparaginyl peptidases [1]. 
Number of members: 26 

■ aa-jsssssss r.=sss =sss-> 

J Biol Chem 1997;272:8090-8098. 
[2173] 867. (Pro_dh) 
Proline dehydrogenase 

" ^^^^ 
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Hrwms of the multifunctional Escherichia coli PutA protein." J 
and delta 1- ^ 

to be fully operational in vivo, it increas es iw 
required for high affinity binding of Ca2 + [2]. 

Number of members: 25 

121771 o m Hiin« Q70671 38 Photoactivation and photoinhibition are 

binding domain. 

. — — pv Medline' 991 93^^ Novel predicted RNA-bindkig domains associated with the trans- 

40 this awaits experimental verification. 
Number of members: 25 

" ^Tmr^vCaicAc^Res^;^^. 

[2184] 871. (Ribosomal_L14e) 
55 ^S^ddestheea^^P-o.a.Lt*. 
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Number of members: 15 
[2186] 872. (Ribosomal_S27) 



r9i861 872 (Ribosomal_S27) 
Number of members: 36 

5S!hSSSj£- . P*— • This M Md. **" « ^ *" " 



Number of members. 39 
[2190] 



teSS=5M ss=--— - 
gSsssssr 

Number of members: 32 

to b^FIIE-alpha subunits also Swiss.029501 [3]. 
Number of members. 12 
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■ Nature 1997;390:364-370. 

iBtfSllIIIil 

~rf=:==^ 

(geneTGM3). 

- Transglutaminase 4 (gene TGM4). yte mem 

inSWISS-PROTNONE. Davie E W J Biol. Chem. 265:13411-13414(1990). [2] GreenbergC.S., Birck- 

[2199] [ 1] Ichinose A., Bottenus R E., fc '^j 

35 Number of members: 33 

122011 ~ <-t i- Moriline 96079944 Purification, cloning, and properties 

rMM , Q7B (DDPGRUTP-glucose-1 -phosphate uridylyltransferase pr o 779 Also known as UDP-glucose py- 
:Mis S u.t we l1l.lnDic«^««urn(s»|^ 



Number of members. 18 
55 [2203] 
[1] 



>31 ljv m Hr.no- QR202932 Sequence differences between 
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01 DfetyoMlum dkcoktam.' """^'SSsWS The eukaryolic UDP-N-a^lylglucosamln, 

^SLTZr^X:^ - —* — J - °- ,998:273: 

14392-14397. 
yqel. 

. Escherichia coli hypothetica. protein yhbY and H.1333, the corresponding Haemophilus influenzae protein. 
- Methanococcus jannaschii hypothetical protein MJ0652. 

Sato in «0. Tnese lingera alsomadia.e IL-1-induced NF-kappaB.cva.ion. 
Number of members: 22 



mHeyn*eKK,BeyaedR;Med.ne : 99.2607^ 



[2208] 881. (zf-PARP) 



Sm^:^^ 

01 nuclear p.otains is dependen, on DNA. I. appears .to f ^^"^7^^ ulall0Ool L molecular evaata 
esses each asdiHeren.ia.ion, prolileral 1000 aminc-acids residues long, 
invoked in the recovery oi ,he cell Irorn ^T^^^S^^ a central ae.omoditica.ion do- 

^or^apanemX-tKBH^.^^ 

[ 1] Althaus F.R., Richter C.R. Mol. Biol. Bicchem Btophys ^;\^ 9 ^. 
k Barnes D.E., Haseltine W.A., Lindahl T. Mol. Cell. Biol. 15:3206-3216(1995). 
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A Asparaginase 2 



r r.hloroa D-Dino 

w ~,,a.bb,„d,n 9 pro,ein S ar^^ 

increase the rate of photosynthesis. 



r. nMRL synthase 

OT DMRL Synthase ^ ,7 "'^ l ^*^^!^j^^^^^|^^2 l ^l^^ n 3^^"^y^^^ l '^^^°"® w^L-S.I-D^^roxy-^BuT^wie 



Yang et al. (1993) PNAS USA 90. 5086-90 
Ustav and Stenlund (1991) EMBO J 10. 449-57 
Callaway et al. (1996) Mol Plant Microbe Interact 9. 810-8 



E. EF1_G 



^ Elongation PaetoM 

sumed to pla, a tola in anehorino the comply » ^J™™£ZL organs o, in response to stress. Manrpulatron 
mat difierent torms ot the EF-1 subunrts ^"J^S,"^ o by statural mutation, may rose, m the accomo- 

K , 2 y 8 ,„, ( , W NA B22;2 ™3, DU n„e,a,,, 99 3 )P ian,Moi B io, 2 , 22 t.s. 9U ,,re,a,,, 99 „ PM « 



Ref: . . 

Biol 17: 351-60 

p FIMV oolvprotein 



I-. riNv .puiYi-" 1 -" 011 1 

malian species, retroviruses are >w«tt to .^^trSoco) specie and have bewshot* to md U oa mutant 
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Genetics 149: 703-15 

-ttssssss&tst " 

i± ^ £22yU5 ^ , *. hvdrolvsis of 1 ,4-a-glucosidic linkages in 

al (1988) J Bacterid 170: 5848-54 

: wmmm. mm 

as elasticity, ot plant cell walls. 

40 
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roots, for example 



,o b. ngutowl by Orcadian rhythms and in a light ^^iC*^""^"^ 
^,a»La™,sL g h«.^ 

' a l. (1993) Plant Mol Biol 23: 619-25 
scnjiallorproparraaulata^^ 

s=y=:^ 



3 ticancer Drugs 8 Suppl 1 : S47-51 

k/i riYidored FMN 

-NADPM + reduced acceptor. One mem- 

[2225 ] NADPHd-**c^^ 
■s ber of this family is yeast old yellow enzyme" (OYE) an ^Jeh B jn addition to exhibiting oxidoreductase 

Jeast family member is a protein that bmds S est ogen binding proteins in plants have been re- 
activity. An Arabidopsis homo.og to OYE has be J, m0 9 dify lipid me taboli S m/catabolism. These pre, 

30 t'lSSUeS. ooncc 79 

Rel: e*.^««-«*««^ 3,7 * M- -" ^ " fc,, " ,, 

Mandani et al. (1994) PNAS USA 91: 922-6 
[2226] TheLoH-plastoquinoneoxidoreducta^ 

Unof in plants these reactions com -n the ^^^^ exceS s reduction equivalents >n the chloro- 

40 

Mol Biol 251: 614-28 

^ ^Poryadsn y ,a,a W .,n 9 P^^^ 

Laroits PABP s,snss that ar, .xpraswd m an. ^ann ^ 

sheets. ,*«. PABP5 is j£ cTandtransiationai int.ia.1on. raw. activity or PABP 

55 «=r»^^ 
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Q Pkinase_C tissues and are known to 

R- R £ V . « GAG an d ENV production in 

r2230 l The REV proteins act post-transcriptionally »^.^Z£Z*> retrovirus-like viruses such as 

S RuffisCosmaH c3 pnotosyn . 

o,*w nations ,„ «*pme n lal twn 9 . .„„,.«,,. 



(1994) J Biol Chem 269. 1394-401 
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■ -, ,n those found in both components of the prokary- 

[2233] Manyplantprote^sinth^ 

otic family of signal ^^^^ZL^ domain and the recerver domain. On m y ^ 
transfer of a phosphate group ' *e ^ transduction^ p role ^ be 

nutrient availability, etc, 



Ref: 



1 aVai,abi,i,y • 6tC - t al f1993 ) Gene 131: 119-124 Gottfert et a.. (1990) PNAS 

Chang et al. (1993) Science 262: 539-44 Nagayaeta.. (1993) 
USA 87: 2680-4 



V. vMSA various retroviruses. Surface anti- 

[22341 vMSAproteinsaremaiorsurfac^ 
Uot retroviruses areoften^^^ 

thus enabling target-ng of the npB9 , Nature 337: 376-80 Wright and Voytas 

w i ea- 9575-83 Grandbastien et al. (1 989) Nature oo/ 
m Okarnotoetal.(1988)JGenV,rol69.2575^3Gr 

(1998) Genetics 149: 703-15 

35 r^:z«----■-•-^ o -™' ,, * 



y 7 f-RanBP repeatSi and ma y contain 

Proteins such as these may . « fiqg5 > 27 - 12 21-6 

, a , ,1 998) Mol Cel. Biol 18: 3034-43 Hernandez Torres et al. (1 995) 27. 
Ref: Vittorioso et al. (1 99») ^ oe " 

Y Peptidase M48, cofactor and are located 

[22371 Proteinsbe^^ 
n the membranes of the endop^ et^ 

theyeastSTE24geneprad^ 
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7 hma Pol Viral N oolvmerase isolated from several 

S^T.T.rgy "« Mto " bV C *' NMUre ^ ^ Wilson .1 * (1994) 

a* nalpain inhib raloain is a non-lysosomal calcium- 

25 ment. 

[2240] Refs 

30 

Ab chglismatg bind AS genes cata |yze the 

40 [22421 Refs 

45 Ac as protein.Lg susceptible to infection by dou- 

r^prsR.M*«U0993 )J O.n«o,74:,» 1 * 
55 Ad Peptidase M41 thaf hin d zinc as a cofactor and are integral 
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» ^hinmnlasts their expression is light regulated 
p te „,s. these proems are M t> ^"^"SK^SCS uJL - «Md protein. » 

the development of chloroplasts. 
Aa. UPF0051 



production. 
[2248] Refs 

known as a potent immortalizing and transforming agen t. ™«™f » * , js t into the ce || cycle and 

SaHon o. E7 with cellular proteins regulating enuy.to the ^ce. cycfc pjThj ^ Jf^ tQ pgpj 

suppression of terminal differentiation ,n mammatan ceH i J^^SZ proliferation characteristics and possibry 
m^virus E7 protein enables the products of ^J^^^J^ expected to result in proliferation of 
M altered morphology. For example ^^^S^^ ^PP ression d dfflerentiation 8Ven,S ? US ' 

or flowering. 

30 Ssc^kTw, Jansen-Durr P Adv Cancer Res 2000;78:1-29 
an, Peptidase U7 

srxrsrpssrr*^ 

chloroplasts, thereby affecting growth and development. 

« . tejjvfligj nl Polvr- T'"- Cangjrisjna Sang! Peptides 
[22531 Poiypeptaescornprisinssi^^^ 

so receptors, proteins retained in the ER, etc modu late ligand-receptor interactions, cell-to-cell 

Sun-IT^r^^ 

pari k. an organism outside or within ot Wg**^^ „ trans p„„ e d opt ot the cell. These proteins can act 
also modulate signal transduction and commun>cat,on between ceils. 
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halting translation, and the resulting bi*- 

„„ ksd ,o h .,.rolo 9 ouss eqUE nc e5 ,n a . e c 1 o,o i eas j a part or fragment of these 

transfom.lng.h.aenom.ofaMstc.aofa k ' arted into the host call to deliver the 

3^^^--« M -" , - WU,,dh,rt ' porateo- 

. Bo=»^ 

Suppression methods, such as 

35 Antisense 
Ribozymes 

Regulatory Sequence Modulation. 

40 as well as Methods for Enhancing Production, such as 

Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

" S«3Es---ss ssasss=s=sss= 



" lit:: •-• 
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encodes the enzyme of interest, see, e.g., Sheeny et aL. Proc. mi *ca 
U.S. Patent No. 4,801,340. 

s ill A 2 Ribozvmes 

I2270] Slmib *^e^c a n te ., a n^ed W oap M ,oc^ mR N«^do»,, e8 ^.— . 

Ill a 3 r.n-Suppression 
" [2271] —ethod*^^ 

)5 in A.4. Seguences into the r.pne to be Modulated 

[2272] Yetanothermeansofsuppressing^^^ 

disrupt transcription or translation of the gene polynucleotide insert to a gene using the Cre-Lox system 

r 227 3] Homologous reckon ^ 

20 (A.C. Vergunst et al. , Nucleic Acids Res. 26.2729 (1 99B), vei y 

etal., P/ant J. Z:649 (1995)). oenome can also be used to disrupt the gene 

[227 4] in addition, random insertion of P^^SSScT screening for clones from a library 
of interest. Azpiroz-Leehan et al., Ttan* ^ Sen^ ^ 0 9^7 • In th,s ^ gene Qf jn _ 

containing random insertions is preferred for ^Y^hfi^and^DT^riiinorfi^lssc rib©d above based on sequences from REF 

25 AN^SEQ TAbTes'i^ANd'^ fragments the"eo^and sub^tertia^y similar sequence thereto. The screening can also 

HI.A.5. R^ nnlatorv Sen ,"«nneModulation 
30 _,. OC c HQPnTARiES land 2 and fragments thereof are examples of nucleotides 

translation from a gene of interest as discussed in I.C.5. 
35 in P 6 — reprising Dominant-Negative Mutations 

» »a(ianl polypeptide which is capable ol competing w«h the natrve WW ■ lasted activity ot the 

. result. 51m*. rrSCZ™^^2. »— » - ■ - b ° 

protein and therefore competes with such native iproter l ^ natjve protejn jtself t0 

[ 22 77] Products from genes comprising ^^^^^JT^^^^tor as one subunit of a hetero- 
phil activity. For example, the native ^^^^".J^., can inhibit activity. 

comprising a gene comprising a dominant-negative mutation. 
Ill B Enhanced Expression 

* I2S791 E „ha„c.dexp, MS io„o f a 9 .heo f ih,a,.s, i hahos,ce,,canb.ac^P»shed t > y ai»>er ( ,,in S ah to no,a„ 
exogenous gene; or (2) promoter modulation. 
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I2M n, Insertion of an «P~*" ° ncodina " 81098 ^ 

to environmental stimuli. 

Ill B 2 Regulator/ Seq"°"™ Modulation 

- , al PraN a«^Sc i . U S.a 9 :~ ( ,»;H=n, i »ono, a , P .oo. N a,M..So, 

(a) BAC: Shizuya et al., Proc. wan. 

(g) Plasmid vectors: Sambrook et al., infra. ^ 
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ically included. The potyadenylation region can be derived from the natural gene, from a variety of other plant genes, 

m T The vector comprising the sequences from genes or SDF or the invention may comprise a marker gene that 
E!f s a selectable phenotype on plant cells. The vector can include promoter and coding sequence, for instance. 
FoTexam e ITlto^FenJ* biocide resistance, particularly antibiotic resistance, such as resistance to h». 
amycSSlB bZyciUygromycin. or herbicide resistance, such as resi^ 

IV.A. Coding Sequences 

[22931 Generally, the sequence in the transformation vector and to be introduced into the genome of the host cell 
does not need to be absolutely identical to an SDF of the present invention. Also, ,t is not necessary for rt to be full 
S £ hi primary transcription product or fully processed mRNA Furthermore^the induce* se- 
quence need not have the same intron or exon pattern as a native gene. Also, heterologous ^'^S^ 
Z incorporated into the coding sequence without changing the desired amino acid sequence of the polypeptide to be 
produced. 
IV.B Promoters 

[2294] As explained above, introducing an exogenous SDF from the same species ;or ^an orMogws SDF f ram 
another species can modulate the expression of a native gene corresponding to that SDF of interest. Such an SDF 
ZZZZZZZZ the control of either a constitutive promoter or a high* regulated inducible promoter (a ^ a 
cSpe nducible promoter). The promoter of interest can initially be either endogenous or heterologous to he species 
Question men re-introduced into the genome of said species, such promoter becomes exogenous to said specie* 
SverSpresston of an SDF transgene can lead to co-suppression of the homologous endogenous sequence thereby 
°ea in T ome alterations in the phenotypes of the transformed ^^^^^^^^ 
u , /wannii m al Plant Ce//2279 (1990) and van der Krol et al., Plant Ce//2.291 (199U)). it an 

SDns7o u r~ 

mula ion can be manipulated in an organ- or tissue-specific manner utilizing a promoter having such specif. crty 
SaST LtoStaTI the promoter of an SDF (or an SDF that includes a promoter) is found to be tissue-specific or 
SXmSy regu a ed, such a promoter can be utilized to drrve or facilitate the transcription of a specific gene o 
SrX^eedstorage protein or root-spectfic protein). Thus, the .eve. o, accumulation of a part.cu.ar pro.e.n can 
be manipulated or its spatial localization in an organ- or tissue- specific manner can be altered. 

IV C Signal Peptides 

[22961 SDFs of the present invention containing signal peptides are indicated in the REEF and SEQ ^ES- In some 
cases it may be desirable for the protein encoded by an introduced exogenous or ortho.ogous SDF to be target^ 1(1) 
to a partS organelle intracellular compartment, (2) to interact with a particular molecu.e such as a membran ^ mol- 
ecule or (3) for secretion outside of the ce.l harboring the introduced SDF. This will be accompl.shed us.ng a signal 

vS£\ Signal peptides direct protein targeting, are invoked in ligand-receptor interactions and act in cel. to cell 
communication. Many proteins, especial* soluble proteins, contain a signal peptide that targets he pro tern t on of 
several different intracellular compartments. In plants, these compartments include, but are not limited to the endo- 
as"« 

; vessic.es (PSV) and, in general, membranes. Some signal peptide sequences are conse ryed sue ^(SarC (19991 
„e-Arg amino acid motif found in the N-termina. propeptide signal that targets proteins ^^^J^ 

of hydrophobic amino acids, such as those signal peptides targeting proteins to the ER (Vitale and Deneck ^ 999 > 
T^Sc e //11:615-628). Still others do not appear to contain either a consensus sequence or an .dentrf.^ 
, secondai sequence, for instance the chloroplast stromal targeting signal peptides (Keegstra and Cne (1999) The 
pTntCelu 557-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an organelle and 
the to a membrane within the organel.e (e.g. within the thylakoid .umen of the chloroplast; see Keegstr l and C I ne 
1 999) Th™ Plant Cell 1 1 ■ 557-570). In addition to the diversity in sequence and secondary structure, placement of the 
gn peptde is Z varied. Proteins destined for the vacuole, for example, have targeting si gna t peptides found * 
5 S N- terminus, at the C-terminus and at a surface location in mature, folded proteins. S.gna. peptides also serve as 

Si IhreVharSLcs of signal proteins can be used to more tightly control the phenotypic expression of 
SLedsSslnpa^ 
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u ^i.ctiH* an example) secretion outside of the cell, targeting interaction with 
of the protein in specific organelles (plasfds, ^ an example h * e °™ SDFs of tne invention in . 

particular receptors etc. HencMhe ^^^^^S^ sequence of the signal peptide can 
creases the range of manipulation of SDF p «^ biological techniques or can be synthesized in vitro, 

be isolated from characterized genes using common °9 nucleotide, described in the REF and 

[23^ n3 Alsa^ragnisnls of the signal peptides of the invention are useful and can be fused with other sinal peptides 
of interest to modulate transport of a polypeptide. 

V. Transformation Techniques 

[2 301] A wide range of techniques for inserting exogenous *™ a numbW d 

Including, wKhout limitation, bacterial, yeast, mammal } ^^J^^ tM known an d described in the tech- 

softheinvention maybe ^££^^^£2^ 
conventional techniques. For example, the DNA co ^J^^^^^. or «ie DNA constructs 
plant ce.l using techniques such as A.ternativeiy, 
can be introduced directly to plant tissue using Mh 8 eqion s an d Soduced into a conventional Agro- 
the DNA constructs may be combined with suitable T ^^^3^TL*cto» host will direct the in- 
tacterium tumefaciens host vector. ™e v .ru^ 

Estrella et al. EMBO J. 2:987 (1983). described in the scientific and patent literature. The 

mediated transformation techniques, '^^^^^^o^^. Muller et al. Mol. Gen. Genet. 207: 
in the scientific literature. See, for example Hamilton, CM., Gene^o p» >n 
s 171 (1987); Komari eta,. J0.166 <' 

Plant Mol. Biol. 20:1203 (1992); Graves and Goldman, Plant Mo/. Biol. 7 J4 (nyeo, ano 

- 4 S (1 t" 1) ', ™h n.«nt cells which are derived by any of the above transformation techniques can be cultured to 
[2305] Transformed plant cells wnicn are aerivea uy <*■ y desired ohenotype such as seedless- 

egenerateawhoie plant that possesses the transformed 8f r ^2^^^^^ t ^^S ure BIO wlh medium. 
» ness. Such regeneration t-*"^^ the desired nucleotide 

typically relying on a biocide and/or herbicide marker ^™ r 3 ' n ^" , al 9 p rotop/asfs isolation and Culture 
sequences. Plant regeneration fromcultu^ 1983; and Bjndjng , Re _ 

in Handbook of Plant Cell Culture; pp. 1 24176 Mac ^^ n J~ n R 9 aton ^ Regen eration can also be obtained 

tion can be used to confer desired traits on indgdl „ g ^ , rom lhe genera Mrafim 

Thus, the invention ^ZZL ^TcLZ^P^Z^. Cocos. Otfaa. C« Ca- 

nam, SO,*™. TTteotromua. lagon*. J» » *™,,™* S ,S incorporated in transgenic ptants and 
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, d apestTreatyattheA m e rl canTypeCultur e Co..ec P/anf Jouma /15:707-720 

actenzes the & eno ^ ^ & termina | 
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S the phosphate groups present on the 5' ends of uncapped incomplete mRNAs, the subsequent decapp-ng of mRNAs 
having^ 

tag Further detail regarding L enzymatic approaches for obtaining mRNAs having intact 5' ends »>MMd In 
Dumas Milne Edwards J.B (Doctoral Thesis of Paris VI University, Le clonage des ADNc complets. difficulty et per- 
sSeTnouvets. Apports pour .'elude de la regulation de Impression de la tryptophane hydroxylase de rat, 20 
Dec 19931 EPO 625572 and Kato ef a/., Gene 150:243-250 (1994). 

P319 f "bof ™ .chemical and the enzymatic approach, the oligonucleotide ^V^^^J^ 
an EcoRI site) therein to facilitate later cloning procedures. Following attachment of the oligonucleotide tag to he 
m^NA^he Integrity of the mRNA is examined by performing a Northern blot using a probe complementary to the 

^'FoMhJmRNAs joined to oligonudeotide tags using eitherthe chemical orthe enzymatic method, first strand 
cDNA synthesis iTperformed using an oligo-dT primer with reverse transcriptase. This oligo-dT primer ^^°"^'"^p 
internal tag of at least 4 nucleotides, which can be different from one mRNA pre P arat,on to another. Methylated dCTP 
s used 7cDNA first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps. The 
Z*«Z cDNA is precipitated using isopropano, after remova. of RNA by a.kaline hydrolysis to e.immate residual 

?23 m 2li S ' SecondstrandcDNAsyntltesisiscondu^ 

orresponding to the 5' end of the ligated oligonucleotide. The primer is typically 20-25 bases J^JU^W- 
dCTP is used for second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion during 

^ M^wcond strand synthesis, the full-length cDNAs are cloned into a phagemid I vector, such as P B.ue- 
Sc"™(SSe). The ends ofti^ 

Is digested with EcoRI. Since methylated dCTP is used during cDNA synthesis, the EcoRI site present irr .he tag £ the 
only hemTmethylated site; hence the only site susceptible to EcoRI digestion. In some instances, to faclrtate subclon- 
inq an Hind III adapter is added to the 3' end of full-length cDNAs. .. . f 

3£ The full-lenqth cDNAs are then size fractionated using either exclusion chromatography (AcA Biosep a) or 
ELS ^sepaSon which yields 3 to 6 different fractions. The full-length cDNAs are then d.rectionally cloned 
SS2 ™ U sin 9 eitherthe EcoRI and Sma, restriction sites or, when the Hind III adapter is present in 
SSJ cDNAs the EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by elec- 
troporation, into bacteria, which are then propagated under appropriate antibiotic selection^ 
2324] Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as olbws. 
2325 The plasmid cDNA libraries made as described above are purified (e.g. by a column a ; ailab ! e ; r ° m ^9 e ^ 
; A positive selection of the tagged clones is performed as fol.ows. Briefly, in this selection P^ u ™^i^SSlt 
is converted to sing.e stranded DNA using phage Fl gene II endonuclease ,n combination with an 
It al Gene 127 95 (1993)) such as exonuclease III orT7 gene 6 exonuclease. The resulting single stranded DNA s 
t P u2f ing ^magnetic beads as described by Fry et a,., 1* 124 (1992). Here he singe 

stranded DNA is hybridized with a biotinylated oligonucleotide having a sequence corresponding to theS end of the 
, o gonucleotide tag Preferably, the primer has a length of 20-25 bases. Clones including a sequence 

to the bidinylated oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by 
mawetrSre After capture of the posrtive clones, the plasmid DNA is released from the magnetic beads and 

Pharmacy Biotech). Alternatively, protocols such as the Gene Trapper™ kit (Gibco BRL) ca be used Th double 
5 stranded DNA is then transformed, preferably by electroporation, intc .barter* T^SSmJ^ 
having the 5' tag oligonucleotide is typically estimated to be between 90 and 98% from dot blot analysis 
[2326] Follow^g Information, the libraries are orde^^ 

was deposited at the American Type Culture Collection on January 7, 2000 as E-coh hba 010600" under the accession 
number PTA-1 161. 

0 

EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 

[2327] TheSDFsoftheinventioncanbeusedinSou^ 

extraction of DNA from nuclei of plant cells, digestion of the nuclear DNA and separate by length, transfe o\ he 
. separated fragments to membranes, preparation of probes tor hybridization, hybridization and detects of the hybnd- 

SsrThe procedures described herein can be used to isolate related polynucleotides or for diagnostic purposes. 
Sate stringency hybridization condtions, as defined above, are described in the present example. These condi- 
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tions result in detection of hybridization between sequences having at least 70% sequence identity. As described above, 
the hybridization and wash conditions can be changed to reflect the desired percentage of sequence identity between 
probe and target sequences that can be detected. 

[2329] In the following procedure, a probe for hybridization is produced from two PCR reactions using two primers 
from genomic sequence of Arabidopsis thaliana. As described above, the particular template for generating the probe 
can be any desired template. 

[2330] The first PCR product is assessed to validate the size of the primer to assure it is of the expected size. Then 
the product of the first PCR is used as a template, with the same pair of primers used in the first PCR, in a second 
PCR that produces a labeled product used as the probe. 

[2331] Fragments detected by hybridization, or other bands of interest, can be isolated from gels used to separate 
genomic DNA fragments by known methods for further purification and/or characterization. 

Buffers for nuclear DNA extraction 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA (disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 MTris 


12.1 g 


Buffer 


0.8 M KCI 


59.6 g 


Adjusts ionic strength for stability of nuclei 


Adjust pH to 9.5 with 10 N NaOH. It appears that there is a nuclease present in leaves. Use of pH 9.5 appears 
to inactivate this nuclease. 



2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50°C. Add the sucrose slowly then bring the mixture to close 
to final volume; stir constantly until it has dissolved. Bring the solution to volume. 



3. Sarkosyl solution (lyses nuclear membranes) 





1000 ml 


N-lauroyl sarcosine (Sarkosyl) 
0.1 MTris 

0.04 M EDTA (Disodium) 


20.0 g 

12.1 g 
14.9 g 


Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper volume. 



4. 20% Triton X-1 00 

80 ml Triton X-1 00 

320 ml 1xHB (w/o p-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 



A. Procedure 
[2333] 

1. Prepare 1X H" buffer (keep ice-cold during use) 
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(continued) 

1000 ml 


2 M sucrose 


250 ml a non-ionic osmoticum 

634 ml 


Water 
Added just before use: 




100 mM PMSF* 
p-mercaptoethanol 


10 ml a protease inhibitor; protects nuclear membrane prote.ns 
1 ml inactivates nuclease by reducing disulfide bonds 



•100 mM PMSF 
(phenyl methyl sulfonyl fluoride. Sl gma P-7626) 
(add0.0875gto5ml100%ethanol) 

the blenders in ice periodically, 
but not nuclear, membranes. 

by gently squeezing the liquid through the filters. 

5 centrifuge the filtrate at 1 200 x g for 20 min. at 4°C to pellet the nuclei. 

roplasts. _■•«•„ 
After the pellets are resuspended. 

Pellet the nuclei again at 1200 - 1300 x g. Discard the supernatant. 
fl e P eatthewash,4t^ 

happens after 3 or 4 suspensions. At th,s PT^SSSS an' mitochondria that contaminate the 

125 ml Erlenmeyer flask. 

„ • cold 2°/ Sarkosyl 0 1 M Tris, 0.04 M EDTA solution (pH 9.5) while swirling gently. This 
7 Add 1 5 ml, dropwise, cold 2 /» Sarkosyi, u. wi ■ 
lyses the nuclei. The solution will become very viscous. 

white and viscous. 



11 Add 20 ul of 1 0 mg/ml EtBr per ml of solution. 
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fooviseous.diluteitwtnTE. ^ TE „„ mMT ,is, 1mM 

ssssssssr— " 

■ \ SSSS s^SSS====- 

A260 concentration of the DNA. DNA. Load 50 ng and 



10 



protocol tor Digestion of Genomic DNA 



[2334] 



equivalent is given in Table 3 , no1 complet e digest.on. 

Lambda DNA provides a uselul control to 20 »c for at least two hours. Yeast DNA 

aaa»JS==ss=== t — — — 

DNA . . .. ,_„«„i not to disturb the pellet). 



aiine nov=- — / 
SSSS.lub-v*..*-"*"'-- 



Be sure 'ha' » «*""»* e,ten< " " 



— — ioW ethanol and incubating in ■20'C lor at least 2 hours 

9 . Pl eo, pte .e. h ed, 9 .s.odONAO,a*in 9 a»o,um.so i ,00,.« 
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(preferably overnight). 

EXCEPTION: Arabidopsis and yeast DNAare digested in an appropriate volume; they don't have to be precipitated. 

1 0. Rest/spend the DNA in an appropriate volume of TE (e.g., 22 ul x 50 blots =1100 ul) and an appropriate volume 
of 10X loading dye (e.g., 2.4 u.1 x 50 blots = 120 ul). Be careful in pipetting the loading dye - it is viscous. Be sure 
you are pipetting the correct volume. 



Table 3 



Some guide p 


oints in digesting genomic DNA. 


pecies 


Genome Size 


Size Relative to 
Arabidopsis 


Genome Equivalent to 
2 ug Arabidopsis DNA 


Amount of DNA per 
blot 


Arabidopsis 


120 Mb 


1X 


1X 


2ug 


Brassica 


1,100 Mb 


9.2X 


0.54X 


10 ug 


Corn 


2,800 Mb 


23. 3X 


0.43X 


20 ug 


Cotton 


2,300 Mb 


19.2X 


0.52X 


20 ug 


Oat 


11,300 Mb 


94X 


0.11X 


20 ug 


Rice 


400 Mb 


3.3X 


0.75X 




Soybean 


1,100 Mb 


9.2X 


0.54X 


10 ug 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 ug 


Sweetclover 


1,100 Mb 


9.2X 


0.54X 


10 ug 


Wheat 


16,000 Mb 


133X 


0.08X 


20 ug 


Yeast 


15 Mb 


0.1 2X 


1X 


025 ug 



Protocol for Southern Blot Analysis 



[2335] The digested DNA samples are electrophoresed in 1 % agarose gels in Ix TPE buffer. Low voltage; overnight 
separations are preferred. The gels are stained with EtBr and photographed. 

1 . For blotting the gels, first incubate the gel in 0.25 N HCI (with gentle shaking) for about 15 min. 

2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incubate (with shakinq) in 0 5 M NaOH 
in1.5MNaClfor15min. 



3. The gel is then briefly rinsed in water and neutralized by incubating twice (with shaking) in 1 5 M Tris dH 7 5 ir 
1.5MNaClfor15min. 



4. A nylon membrane is prepared by soaking it in water for at least 5 min, then in 6X SSC for at least 15 min 
before use. (20x SSC is 175.3 g NaCI, 88.2 g sodium citrate per liter, adjusted to pH 7.0.) 

5. The nylon membrane is placed on top of the gel and all bubbles in between are removed. The DNA is blotted 
from the gel to the membrane using an absorbent medium, such as paper toweling and 6x SCC buffer. After the 
transfer, the membrane may be lightly brushed with a gloved hand to remove any agarose sticking to the surface. 



6. The DNA is then fixed to the membrane by UV crosslinking and baking at 80°C. The membrane is stored at 4 9 C 
until use. 



B. Protocol 
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fnr PCR Amplification of Genom ir. Fragments in Arabidopsis 



Amplification procedures: 
[2336] 

I . Mix the following in a 0.20 ml PCR tube or 96-well PCR plate: 



Volume 


Stock, 




0.5 uJ 


-10 ng/u.l genomic DNA 1 


5 ng 


2.5 uf 


10X PCR buffer 


20 mM Tris, 50 mM KCI 


0.75 uJ 


50 mM MgCI 2 


1 .5 mM 


1uJ 


10 pmol/u.l Primer 1 (Forward) 


1 0 pmol 


mi 


10 pmol/u.l Primer 2 (Reverse) 


1 0 pmol 


0.5 |il 


5 mM dNTPs 


0.1 mM 


0.1 U.I 


5 unfts/ul Platinum Taq" (Life Technologies, Gaithersburg, MD) DNA 
Polymerase 


1 units 


(to 25 nl) 


Water . 





2. The template DNA is amplified using a Perkin Elmer 9700 PCR machine: 



1) 94°C for 10 min. followed by 


2) 


3) 


j_4) _ 


5 cycles: 


5 cycles: 


25 cycles: 


94 "C - 30 sec 
62 °C - 30 sec 
72 °C - 3 min 


94 °C - 30 sec 
58 °C - 30 sec 
72 °C - 3 min 


94 °C - 30 sec 
53 °C - 30 sec 
72 °C - 3 min 


5) 72°C for 7 min. Then the reactions are stopped by chilling to 4°C. 



[2337] The procedure can be adapted to a multi-well format if necessary. 

Quantification and Dilution of PCR Products: 

[2338] 

1 The nroduct of the PCR is analyzed by electrophoresis in a 1% agarose gel. A linearized plasmid DNA can be 
1. The product of the KOMjs ana.y y k ^ be used gs references to 

used as a quanttm JnS Ji^^ DNA is useful as a mo.ecu.ar weight marker. 

7:Tct:Z™X • T* ™ S. ***** * I. examined to determine that the size of the 
K^^-^Z with the' expected size and if there are significant extra bands or smeary products ,n 
the PCR reactions. 

2 The amounts of PCR products can be estimated on the basis of the plasmid standard. 
The small amount of agarose gel (with the DNA fragment) .s used ,n the labeling reaction. 
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C. Protocol for PCR-DIG-Labeling of DNA 

Solutions: 
5 [2339] 

Reagents in PCR reactions (diluted PC R products, 1 0X PCR Buffer, 50 mM MgCI 2 , 5 U/ul Platinum Taq Polymerase, 
and the primers) 

10 tOXdNTP + DIG-11-dUTP [1:5]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1.65 mM dTTP, 0.35 mM DIG-11-dUTP) 

10XdNTP + DIG-11-dUTP [1:10]: (2 mM dATP, 2 mM dCTP, 2 mMdGTP, 1.81 mM dTTP, 0.19 mM DIG-11-dUTP) 
10XdNTP + DIG-11-dUTP[1:15]:(2mMdATP,2mMdCTP,2mMdGTP,1.875mMdTTP,0.125mMDIG-11-dUTP) 

15 

TE buffer (10 mM Tris, 1 mM EDTA, pH 8) 

Maleate buffer: In 700 ml of deionized distilled water, dissolve 11 .61 g maleic acid and 8.77 g NaCI. Add NaOH to 
adjust the pH to 7.5. Bring the volume to 1 L. Stir for 1 5 min. and sterilize. 

10% blocking solution: In 80 ml deionized distilled water, dissolve 1.16g maleic acid. N«xt ^dd NaOH I tc . adjust 
the pH to 7 5 Add 1 0 g of the blocking reagent powder (Boehringer Mannhe.m, Indianapolis, IN Cat no. 1 096176). 
Heat tc 60'C while stTrring to dissolve the powder. Adjust the volume to 1 00 m. with water. St.r and sterilize. 

25 1% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 

Buffer 3 (100 mM Tris, 100 mM NaCI, 50 mM MgCI 2 , P H9.5). Prepared from autoclaved solutions of 1M Tris pH 
9.5, 5 M NaCI, and 1 M MgCI 2 in autoclaved distilled water. 

30 Procedure: 



1. PCR reactions are performed in 25 uJ volumes containing: 



PCR buffer 


1X 


MgCI 2 


1.5 mM 


10X dNTP+DIG-11-dUTP 


1X (please see the note below) 


Platinum Taq™ Polymerase 


1 unit 


10pg probe DNA 




10 pmol primer 1 







Use for: 


10X dNTP + DIG-11-dUTP (1 


5) 


<1 kb 


10XdNTP + DIG-11-dUTP (1 
10XdNTP + DIG-11-dUTP (1 


10) 
15) 


1 kbto1.8kb 
>1.8kb 



2. The PCR reaction uses the following amplification cycles: 

55 
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1)94°Cfor10min. 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


95°C - 30 sec 
61-C-1 min 
73°C - 5 min 


95°C - 30 sec 
59°C - 1 min 
75 8 C - 5 min 


95°C - 30 sec 
51 °C - 1 min 
73°C - 5 min 


5) 72°C for 8 min. The reactions are terminated by chilling to 4"C (hold). 



3. The products are analyzed by electrophoresis- in a 1% agarose gel, comparing to an aliquot of the unlabeled 
probe starting material. 

4. The amount of DIG-labeled probe is determined as follows: 

Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris and 1 mM EDTA dH 8) as 
shown in the following table: ' H ' 



DIG-labeled control DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution Name) 


5ng/u.l 


1 nl in 49 uJ TE 


100 pg/uJ (A) 


100 pg/uJ (A) 


25 >il in 25 (il TE 


50 pg/nl (B) 


50 pg/uJ (B) 


25 nl in 25 ul TE 


25 pg/ul (C) 


25 pg/jxl (C) 


20 Hi in 30 Hi TE 


10 pg/ui (D) 



aSerial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg are spotted onto a positively 
charged nylon membrane, marking the membrane lightly with a pencil to identify each dilution. 

b. Serial dilutions (e.g., 1:50, 1:2500, 1:10,000) of the newly labeled DNA probe are spotted. 

c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solution • 
tor 1 5 mm at room temp. 

e. The labeled DNA is then detected using alkaline phosphatase conjugated anti-DIG antibody (Boehringer 
Mannheim, Indianapolis, IN, cat. no. 1093274) and an NBT substrate according to the manufacture's instruc- 



L ?h P 0 ° pro™?? ? Tl CC T 01 ex P erimental dilutions ™ ^en compared to estimate the concentration 
of the PCR-DIG-labeled probe. 

D. Prehybridization and Hybridization of Southern Blots 

Solutions: 
[2341] 



100%Formamide 


purchased from Gibco 


20X SSC 


(1X= 0.15 M NaCI, 0.015 M Na 3 citrate) 


perL: 


175 g NaCI 




87.5 g Na 3 citrate-2H 2 0 



20% Sarkosyl (N-lauroyl-sarcosine) 
20% SDS (sodium dodecyl sulphate) 
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the volume to 100 ml with water. Stir and sterilize. 



Dr 0 h»,hriHi7ation Mix: 




Final Concentration 


Components 


Volume (per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


SSC 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1 ml 


20% 


2% 


Blocking Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures: 
[2342] 

least 2 hours with gentle agitation using a waver shaker. 

2 . Denature D,G-,abe,ed DNA probe by incubating tor 10 min. at 98'C using the PGR machine and immediately 
cool it to 4° C. 

3, Add probe ,d prttfrttaft. -"ion p « *, m , = 750 n 9 «. P-obo, and rt, we,, M avoid foaming. 
Bubbles may lead to background. 

4 Pour off the prehybridization solution from the hybridization bags and add new prehybridization and probe so- 
lution mixture to the bags containing the membrane. 

5. Incubate with gentle agitation for at least 16 hours. 

6. Proceed to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using 1 X SSC, 1% SDS at 60°C. 

All wash solutions must be prewarmed to 60°C. Use about 100 ml of wash soiution per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; agitate sufficiently to 

avoid having membranes stick to one another. 

7. After the wash, proceed to immunological detection and CSPD development. 
0 E. Procedure for Immunological Detection with CSPD 

Solutions: 
[2343] 

' 5 BuffeM: Maleic acid buffer (0.1 M ma.eic acid, 0.15 M NaCI; adjusted to P H 7.5 with NaoH) 
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Washing buffer: 
Blocking stock solution 
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Maleic acid buffer wrth 0.3% (v/v) Tween 20. 



Buffer 2 

(1X blocking solution): 
Detection buffer: 
Procedure: 
[2344] 



Dilute the stock solution 1:10 in Buffer 1. 
0.1 MTris, 0.1 MNaCI. pH 9 5 



shaking. 

— — — — P-—* — — 

blots. 



S. The blots are equilibrated for 2-5 min in 60 m, detection buffer. 



6 The blots are equmuia»=" — - . . 4 „ m 

let the membrane dry completely. , minfl<iC ent 
10 Seal the damp membrane in a nyciu 

=s==s=ssskb==- """" 



27 "C in the dark. 
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T23471 The suspension culture cells are transformed with exogenous DNA as described by Z. Chen et al. Plant Mol. 
Bio 36163 (1998) Briefly 4-days post-subculture cells are incubated with cell wall digestion solution containing 0.4 
M soMol, 2% drise.ase, 5mM MES (2-fN-Morpholino] ethanesulfonic acid) pH I 5.0 for 5 hours. Thi .digested cells .are 
pelleted gently at 60 xgfor 5 min. and washedtwice in W5 solution containing 154 mM NaC I 5 rrM KCI, 125 mMCaCfe 
and 5mM glucose pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES, 20 mM CaCL,, 0.5 
M mannitol, pH 5.7 and the protoplast density is adjusted to about 4x10* protoplasts per ml. 
r23481 15-60 ag of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40 /„ 
polyethylene glycol (MW 8000, PEG 8000), by gentle inversion a few times at room temperature for 5 to 25 mm. 
Protoplast culture medium known in the art is added into the PEG-DNA- P rotoplast mixture. Protoplasts are incubated 
in the culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient expression of the 
introduced gene. Alternatively, transformed cells can be used to produce transgenic callus, which in turn can be used 
to produce transgenic plants, by methods known in the art. See, for example, Nomura and Komam.ne, Pit. Phys.J9: 
988-991 (1 985), Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot Suspension Cul- 

I2349] The invention being thus described, it will be apparent to one of ordinary skill in the art that various modifica- 
tions of the materials and methods for practicing the invention can be made. Such modifications are to be considered 
within the scope of the invention as defined by the following claims. 

[2350] Each of the references from the patent and periodical literature cited herein is hereby expressly incorporated 
in its entirety by such citation. 



1 An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which encodes an amino 
acid sequence exhibiting at least 40% sequence identity to an amino acid sequence encoded by 

(a) a nucleotide sequence described in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

2. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

3. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
65% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4 An isolated nucleic acid molecule which is the reverse of the isolated nucleotide sequence according to any one 
of claims 1 -3, such that the reverse nucleotide sequence has a sequence order which is the reverse of the sequence 
order of said isolated nucleotide sequence according to any one of claims 1 -3. 

5. An isolated nucleic acid molecule comprising a nucleic acid capable of hybridizing to a nucleic acid having a 
sequence selected from the group consisting of: 

(a) a nucleotide sequence which is shown in REF and/or SEQ Table 1 or 2; and 

(b) a nucleotide sequence which is complementary to a nucleotide sequence shown in REF and/or SEQ Table 
1 or 2; 

under conditions that permit formation of a nucleic acid duplex at a temperature from about 40°C and 48°C below 
the melting temperature of the nucleic acid duplex. 

6. The nucleic acid molecule according to any one of claims 1 -5, wherein said nucleic acid comprises an open reading 
frame. 
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acid molecule is heterologous to any element .n sa,d vector construct. 
,s 11 A vector construct according to claim 10 comprising: 

claims 1-4; 

gous to any element in said vector construct, 
acid. 

14 A vector construct according to claim 10 comprising: 

(c, a first nucleic acid having having the sequence of said isolated nucleic acid m o,ecu.e according to Cairn 
7; and 

(d) a second nucleic acid; 
to any element in said vector construct. 

acid. 



acid molecule is flanked by exogenous sequence. 
45 18. A host cell comprising a vector construct of any one of claims 10-16. . 
19 An isolated polypeptide comprising an amino acid sequence 
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to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

22 The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 90% sequence identity 
' to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

23. An antibody capable of binding the isolated polypeptide of any one of claims 19-22. 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

fa) providing an isolated nucleic acid molecule according to any one of claims 1 -4; and 

(b) contacting said isolated nucleic with said host cell under conditions that permit insertion of said nucle.c 

acid into said host cell. 

25. A method of transforming a host cell which comprises contacting a host cell with a vector construct according to 
any one of claims 10-16. 

26. A method of modulating transcription and/or translation of a nucleic acid in a host cell comprising: 

(a) providing the host cell of claim 24 or 25; and 

(b) culturing said host cell under conditions that permit transcription or translation. 

27. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1-5; 
b contacting said isolated nucleic acid molecule with a sample under condrtions which permit a comparison 
of the sequence of said isolated nucleic acid molecule with the sequence of DNA in said sample; and 

(c) analyzing the result of said comparison. 

28 The method according to claim 27, wherein said isolated nucleic acid molecule and said sample are contacted 
' under conditions which permit the formation of a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which is 
exogenous to said plant or plant cell. 

30. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1 -4, wherein said 
nucleic acid molecule is heterologous to said plant or said cell of a plant. 

31 . A plant or cell of a plant which has been transformed with a nucleic acid molecule according to any one of claims 1 -4. 

32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. A plant of cell of a plant which has been transformed with a vector construct according to any one of claims 1 0-1 6. 

34. A plant which has been regenerated from a plant cell according to any one of claims 29-33. 
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Sequence-determined DNA fragments and corresponding polypeptides encoded thereby 



(57) The present invention provides DNA molecules 
that constitute fragments of the genome of a plant, and 
polypeptides encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3' termination sequence, and are also useful in 
controlling the behavior of a gene in the chromosome, 



in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or identification of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

°Arabidopsis DNA is used in the present experi- 
ment, but the procedure is a general one. 
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The Search Division considers that the present European patent application does not comply with the 
requirements ol unity of invention and relates to several inventions or groups of inventions, namely: 

1. Claims: Invention 1: Claims 1-34 all partially 

Nucleic acid molecules and corresponding encoded polypeptide 
sequences as defined by SEQ ID N0S:l-4, and sequences with 
identities thereto as defined by the 
claims, vectors, methods, hosts, and plants comprising said 
sequences and antibodies to said polypeptides 



2. Claims: Inventions 2-8G256: Claims 1-34 all partially as 
appropriate for either nucleic or polypeptide 
sequences 



Nucleic acid or polypeptide sequences as defined by SEQ ID 
NOS: 5-80259 such that each consecutive sequence relates to 
an individual invention where invention 2 relates to SEQ ID 
NO: 5, inventi on 3 to SEQ ID NO: 6 continuing through to 
invention 80256 and SEQ ID NO: 80259, and the corresponding 
methods, vectors, hosts and antibodies relating to said 
sequences . 



It should be noted that for the first invention both nucleic 
acid and the corresponding encoded polypeptide sequences 
have been grouped as one(SEQ ID N0S:l-4).In view of the 
unreasonable burden required to regroup the remaining 
nucleic acid sequences with their corresponding encoded 
polypeptides, each of these sequences has been communicated 
as a seperate invention. However in the event that further 
search fees are paid for subsequent inventions, the search 
division is willing to regroup sequences on the basis of any 
one nucleic acid sequence in combination with its encoded 
polypeptide(s),in so far as such sequences have not been 
previously disclosed. In effect any subsequent search for any 
individual nucleic acid sequence (with its encoded 
polypeptides) will be subject to an additional search fee. 
Should additional fees be paid then the applicant is kindly 
requested to indicate the corresponding encoded polypeptide 
sequences for each nucleic acid sequence to be searched. 
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